LightGBM Model

What’s your use case?

LightGBM is a gradient boosting framework which I have found is outperforming other Orange’ native learner widgets in my current work on malware feature classification data. It is achieving highest performing results across a range of malware feature datasets (tested on ClaMP, SOMLAP, EMBER). Currently, I’ve implemented it via the Python Script widget (POC below). This is fine for my current uses, but given its strong performance I thought I would flag to you as potential for a standalone widget. 

What’s your proposed solution?

A native Orange3 LightGBM learner widget, exposing key lightGBM parameters (estimators, learning rate, max depth, num leaves) through Orange’s graphical user interface. Make the widget compatible with Test and Score, Predictions, and other downstream evaluation tools.

Are there any alternative solutions?

I’m using the Python Script widget, which is fine for a workaround, but does requires manual coding and doesn’t persist settings reliably across sessions/duplicate canvasses. I know that Orange’s Gradient Boosting widget exists, but this is less performant (speed and results) on my malware feature vector data (ie I have 1200 features fed in for my EMBER data), compared to LightGBM’s leaf-wise tree growth strategy. 

Current Implementation (Python Script widget)

The following Python implements LightGBM as an Orange-compatible learner and achieves higher results than all other available models across 10-fold stratified cross-validation for my datasets:

import Orange
from Orange.classification import Learner, Model
import lightgbm as lgb
import numpy as np

class LightGBMLearner(Learner):
    name = "LightGBM"
    
    def fit_storage(self, data):
        X = data.X
        y = data.Y.ravel()
        
        train_data = lgb.Dataset(X, label=y)
        params = {
            'objective': 'binary',
            'metric': 'binary_error',
            'boosting_type': 'gbdt',
            'num_leaves': 63,
            'learning_rate': 0.05,
            'min_child_samples': 5,
            'feature_fraction': 0.8,
            'bagging_fraction': 0.8,
            'bagging_freq': 5,
            'verbose': -1
        }
        booster = lgb.train(params, train_data, num_boost_round=200)
        return LightGBMModel(booster, data.domain)

class LightGBMModel(Model):
    def __init__(self, booster, domain):
        super().__init__(domain)
        self.booster = booster
    
    def predict(self, X):
        probs = self.booster.predict(X)
        classes = (probs > 0.5).astype(int)
        return classes, np.column_stack([1-probs, probs])

out_learner = LightGBMLearner()



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LightGBM Model #7295

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

LightGBM Model #7295

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions