Adding a New Base Learner

# Adding a New Base Learner

This guide explains how to extend the Ensemble Genetic Algorithm project by adding your own custom machine learning model as a base learner. This allows the genetic algorithm to include your model when building and evolving ensembles.

—

## Overview

The framework is designed to be extensible. Any model that follows a scikit-learn compatible API (.fit(), .predict_proba()) can be integrated. To do this, you need to create a “Model Generator” class. This class acts as a wrapper that tells the framework how to:

Define the hyperparameter search space for your model.
Instantiate your model with a given set of hyperparameters.

## The Model Generator Class Structure

A model generator class must have the following structure:

An __init__ method that accepts ml_grid_object and local_param_dict.
A get_hyperparameter_space method that returns the hyperparameter search space for the model.
A get_model method that returns an instance of your model, configured with specific hyperparameters.

## Step-by-Step Guide

Let’s walk through adding a SGDClassifier from scikit-learn as a new base learner.

### Step 1: Create the Model Generator File

Create a new Python file in the ml_grid/model_classes_ga/ directory. Let’s call it sgd_classifier_model.py.

### Step 2: Define the Class and its Methods

In your new file, define the SGDClassifierModelGenerator class with the required methods. The complete code for the file should look like this:

```python from sklearn.linear_model import SGDClassifier from hyperopt import hp import numpy as np

class SGDClassifierModelGenerator:

“”” A model generator for the scikit-learn SGDClassifier. “”” def __init__(self, ml_grid_object, local_param_dict):

self.ml_grid_object = ml_grid_object self.local_param_dict = local_param_dict # Access global parameters like random_state self.global_param_dict = ml_grid_object.global_param_dict

def get_hyperparameter_space(self):

“”” Returns the hyperparameter search space for SGDClassifier. “”” return {

‘loss’: hp.choice(‘loss’, [‘hinge’, ‘log_loss’, ‘modified_huber’]), ‘penalty’: hp.choice(‘penalty’, [‘l2’, ‘l1’, ‘elasticnet’]), ‘alpha’: hp.loguniform(‘alpha’, np.log(0.0001), np.log(0.1)), ‘max_iter’: hp.choice(‘max_iter’, [1000, 2000, 3000]), ‘tol’: hp.loguniform(‘tol’, np.log(1e-4), np.log(1e-2)),

}

def get_model(self, param_dict):

“”” Returns an initialized SGDClassifier model instance. “”” model = SGDClassifier(

loss=param_dict[‘loss’], penalty=param_dict[‘penalty’], alpha=param_dict[‘alpha’], max_iter=param_dict[‘max_iter’], tol=param_dict[‘tol’], random_state=self.global_param_dict.get(‘random_state’), class_weight=’balanced’ # Often a good default for classification

) return model

``` Note: The log_loss or modified_huber options for the loss parameter are important, as they enable SGDClassifier to provide probability estimates via predict_proba(), which is required by the ensemble methods.

### Step 3: Integrate into the Experiment

Now, you can use your new model generator in your experiments. The framework automatically discovers model generators based on the names provided in your config.yml.

Update `config.yml`: Add the name of your new model to the model_list under global_params. The name should match the class name of your model generator.

```yaml # in config.yml global_params:
model_list:
“LogisticRegressionModelGenerator”

“RandomForestModelGenerator”

“XGBoostModelGenerator”

“SGDClassifierModelGenerator” # Add your new model’s class name
```

That’s it! When you run an experiment using main.py, the framework will now be able to select, tune, and include SGDClassifier in the ensembles it evolves. You can follow this same pattern to add any scikit-learn compatible classifier to the project.

—

## Creating Your Own Custom Classifiers

Beyond using existing scikit-learn models, you can also create your own custom estimators from scratch. As long as your custom class adheres to the scikit-learn API (implementing .fit(), .predict_proba(), get_params(), and set_params()), it can be integrated into this framework using the same model generator pattern.

This allows you to experiment with novel algorithms while still leveraging the power of the genetic algorithm for ensembling and hyperparameter tuning.

For detailed instructions on how to develop your own scikit-learn compatible estimators, refer to the official scikit-learn developer’s guide.