Model Deployment Guide

This guide explains how to take the best ensemble model discovered by the genetic algorithm and deploy it for production use as a portable, scikit-learn compatible object.


Overview

The genetic algorithm produces ensembles composed of various base learners, each with its own hyperparameters and feature subset. To make this “portable,” the framework provides a scikit-learn compatible wrapper called SklearnEnsembleClassifier.

This wrapper encapsulates the logic for feature masking (ensuring each base model only sees its specific features) and prediction averaging, allowing you to treat the entire ensemble as a single estimator.

The Deployment Workflow

Deploying a model involves four main steps:

  1. Identify the Best Run: Locate your experiment results in final_grid_score_log.csv and pick the iteration with the highest performance metric (usually auc).

  2. Reconstruct the Ensemble: Use the EnsembleEvaluator to “re-hydrate” the architecture string back into functional Python objects.

  3. Final Fit: Train the constituent base learners on your training dataset.

  4. Serialize: Save the fitted ensemble to a file using joblib.

Example Reconstruction

from ml_grid.util.evaluate_ensemble_methods import EnsembleEvaluator
from ml_grid.util.ensemble_classifier import SklearnEnsembleClassifier
import joblib
import pandas as pd

# 1. Load results and find best ensemble
results_df = pd.read_csv("path/to/final_grid_score_log.csv")
best_row = results_df.loc[results_df['auc'].idxmax()]

# 2. Initialize evaluator to parse the architecture string
evaluator = EnsembleEvaluator(...)
parsed_arch = evaluator._parse_ensemble(best_row['best_ensemble'])[0]

# 3. Wrap in the Sklearn classifier and fit
my_ensemble = SklearnEnsembleClassifier(parsed_arch, evaluator.original_feature_names)
my_ensemble.fit(evaluator.ml_grid_object.X_train, evaluator.ml_grid_object.y_train)

# 4. Save the model
joblib.dump(my_ensemble, "deployed_ensemble_model.joblib")

Production Environment

To run the model on another server, the target environment requires the following:

  • Python: >=3.12 (matches pyproject.toml requirement)

  • Core Libraries: numpy, pandas, scikit-learn, joblib.

  • PyTorch: Required if your ensemble includes neural network base learners (BinaryClassification).

  • Project Package: The ensemble-genetic-algorithm package must be installed so the environment can resolve the custom class definitions during deserialization.

# Install from the local repository as the package is not on PyPI yet
./setup.sh --cpu

Running Predictions in Production

Once the .joblib file is transferred and the environment is ready, you can use the model with minimal code:

import joblib
import pandas as pd

# Load the model
model = joblib.load('deployed_ensemble_model.joblib')

# Prepare data (must be a DataFrame with original feature names)
new_data = pd.read_csv('production_data.csv')

# Get predictions and probabilities
predictions = model.predict(new_data)
probabilities = model.predict_proba(new_data)

Note: The SklearnEnsembleClassifier is robust to column ordering; as long as the required feature names are present in the input DataFrame, it will correctly subset the data for each base learner internally.