Model Deployment Guide
This guide explains how to take the best ensemble model discovered by the genetic algorithm and deploy it for production use as a portable, scikit-learn compatible object.
Overview
The genetic algorithm produces ensembles composed of various base learners, each with its own hyperparameters and feature subset. To make this “portable,” the framework provides a scikit-learn compatible wrapper called SklearnEnsembleClassifier.
This wrapper encapsulates the logic for feature masking (ensuring each base model only sees its specific features) and prediction averaging, allowing you to treat the entire ensemble as a single estimator.
The Deployment Workflow
Deploying a model involves four main steps:
Identify the Best Run: Locate your experiment results in
final_grid_score_log.csvand pick the iteration with the highest performance metric (usuallyauc).Reconstruct the Ensemble: Use the
EnsembleEvaluatorto “re-hydrate” the architecture string back into functional Python objects.Final Fit: Train the constituent base learners on your training dataset.
Serialize: Save the fitted ensemble to a file using
joblib.
Example Reconstruction
from ml_grid.util.evaluate_ensemble_methods import EnsembleEvaluator
from ml_grid.util.ensemble_classifier import SklearnEnsembleClassifier
import joblib
import pandas as pd
# 1. Load results and find best ensemble
results_df = pd.read_csv("path/to/final_grid_score_log.csv")
best_row = results_df.loc[results_df['auc'].idxmax()]
# 2. Initialize evaluator to parse the architecture string
evaluator = EnsembleEvaluator(...)
parsed_arch = evaluator._parse_ensemble(best_row['best_ensemble'])[0]
# 3. Wrap in the Sklearn classifier and fit
my_ensemble = SklearnEnsembleClassifier(parsed_arch, evaluator.original_feature_names)
my_ensemble.fit(evaluator.ml_grid_object.X_train, evaluator.ml_grid_object.y_train)
# 4. Save the model
joblib.dump(my_ensemble, "deployed_ensemble_model.joblib")
Production Environment
To run the model on another server, the target environment requires the following:
Python: >=3.12 (matches
pyproject.tomlrequirement)Core Libraries:
numpy,pandas,scikit-learn,joblib.PyTorch: Required if your ensemble includes neural network base learners (
BinaryClassification).Project Package: The
ensemble-genetic-algorithmpackage must be installed so the environment can resolve the custom class definitions during deserialization.
# Install from the local repository as the package is not on PyPI yet
./setup.sh --cpu
Running Predictions in Production
Once the .joblib file is transferred and the environment is ready, you can use the model with minimal code:
import joblib
import pandas as pd
# Load the model
model = joblib.load('deployed_ensemble_model.joblib')
# Prepare data (must be a DataFrame with original feature names)
new_data = pd.read_csv('production_data.csv')
# Get predictions and probabilities
predictions = model.predict(new_data)
probabilities = model.predict_proba(new_data)
Note: The SklearnEnsembleClassifier is robust to column ordering; as long as the required feature names are present in the input DataFrame, it will correctly subset the data for each base learner internally.