Skip to content

LightGBM Model #7295

@jtaylo87

Description

@jtaylo87

What’s your use case?

LightGBM is a gradient boosting framework which I have found is outperforming other Orange’ native learner widgets in my current work on malware feature classification data. It is achieving highest performing results across a range of malware feature datasets (tested on ClaMP, SOMLAP, EMBER). Currently, I’ve implemented it via the Python Script widget (POC below). This is fine for my current uses, but given its strong performance I thought I would flag to you as potential for a standalone widget.

What’s your proposed solution?

A native Orange3 LightGBM learner widget, exposing key lightGBM parameters (estimators, learning rate, max depth, num leaves) through Orange’s graphical user interface. Make the widget compatible with Test and Score, Predictions, and other downstream evaluation tools.

Are there any alternative solutions?

I’m using the Python Script widget, which is fine for a workaround, but does requires manual coding and doesn’t persist settings reliably across sessions/duplicate canvasses. I know that Orange’s Gradient Boosting widget exists, but this is less performant (speed and results) on my malware feature vector data (ie I have 1200 features fed in for my EMBER data), compared to LightGBM’s leaf-wise tree growth strategy.

Current Implementation (Python Script widget)

The following Python implements LightGBM as an Orange-compatible learner and achieves higher results than all other available models across 10-fold stratified cross-validation for my datasets:

import Orange
from Orange.classification import Learner, Model
import lightgbm as lgb
import numpy as np

class LightGBMLearner(Learner):
name = "LightGBM"

def fit_storage(self, data):
    X = data.X
    y = data.Y.ravel()
    
    train_data = lgb.Dataset(X, label=y)
    params = {
        'objective': 'binary',
        'metric': 'binary_error',
        'boosting_type': 'gbdt',
        'num_leaves': 63,
        'learning_rate': 0.05,
        'min_child_samples': 5,
        'feature_fraction': 0.8,
        'bagging_fraction': 0.8,
        'bagging_freq': 5,
        'verbose': -1
    }
    booster = lgb.train(params, train_data, num_boost_round=200)
    return LightGBMModel(booster, data.domain)

class LightGBMModel(Model):
def init(self, booster, domain):
super().init(domain)
self.booster = booster

def predict(self, X):
    probs = self.booster.predict(X)
    classes = (probs > 0.5).astype(int)
    return classes, np.column_stack([1-probs, probs])

out_learner = LightGBMLearner()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions