# Model Deployment Guide

This guide explains how to take the best ensemble model discovered by the genetic algorithm and deploy it for production use as a portable, scikit-learn compatible object.

---

## Overview

The genetic algorithm produces ensembles composed of various base learners, each with its own hyperparameters and feature subset. To make this "portable," the framework provides a scikit-learn compatible wrapper called `SklearnEnsembleClassifier`. 

This wrapper encapsulates the logic for feature masking (ensuring each base model only sees its specific features) and prediction averaging, allowing you to treat the entire ensemble as a single estimator.

## The Deployment Workflow

Deploying a model involves four main steps:

1.  **Identify the Best Run**: Locate your experiment results in `final_grid_score_log.csv` and pick the iteration with the highest performance metric (usually `auc`).
2.  **Reconstruct the Ensemble**: Use the `EnsembleEvaluator` to "re-hydrate" the architecture string back into functional Python objects.
3.  **Final Fit**: Train the constituent base learners on your training dataset.
4.  **Serialize**: Save the fitted ensemble to a file using `joblib`.

### Example Reconstruction

```python
from ml_grid.util.evaluate_ensemble_methods import EnsembleEvaluator
from ml_grid.util.ensemble_classifier import SklearnEnsembleClassifier
import joblib
import pandas as pd

# 1. Load results and find best ensemble
results_df = pd.read_csv("path/to/final_grid_score_log.csv")
best_row = results_df.loc[results_df['auc'].idxmax()]

# 2. Initialize evaluator to parse the architecture string
evaluator = EnsembleEvaluator(...)
parsed_arch = evaluator._parse_ensemble(best_row['best_ensemble'])[0]

# 3. Wrap in the Sklearn classifier and fit
my_ensemble = SklearnEnsembleClassifier(parsed_arch, evaluator.original_feature_names)
my_ensemble.fit(evaluator.ml_grid_object.X_train, evaluator.ml_grid_object.y_train)

# 4. Save the model
joblib.dump(my_ensemble, "deployed_ensemble_model.joblib")
```

## Production Environment

To run the model on another server, the target environment requires the following:

-   **Python**: >=3.12 (matches `pyproject.toml` requirement)
-   **Core Libraries**: `numpy`, `pandas`, `scikit-learn`, `joblib`.
-   **PyTorch**: Required if your ensemble includes neural network base learners (`BinaryClassification`).
-   **Project Package**: The `ensemble-genetic-algorithm` package must be installed so the environment can resolve the custom class definitions during deserialization.

```bash
# Install from the local repository as the package is not on PyPI yet
./setup.sh --cpu
```

## Running Predictions in Production

Once the `.joblib` file is transferred and the environment is ready, you can use the model with minimal code:

```python
import joblib
import pandas as pd

# Load the model
model = joblib.load('deployed_ensemble_model.joblib')

# Prepare data (must be a DataFrame with original feature names)
new_data = pd.read_csv('production_data.csv')

# Get predictions and probabilities
predictions = model.predict(new_data)
probabilities = model.predict_proba(new_data)
```

**Note**: The `SklearnEnsembleClassifier` is robust to column ordering; as long as the required feature names are present in the input DataFrame, it will correctly subset the data for each base learner internally.