Ensemble Genetic Algorithm
Welcome to the documentation for ga-project!
ga-project is a Python library that leverages genetic algorithms to construct
powerful machine learning model ensembles. It automates the challenging process of
selecting the best combination of models and their weights to maximize predictive
performance. This tool is for data scientists and machine learning engineers looking
to improve their model accuracy with sophisticated ensembling techniques.
Note
This project is under active development and the API may change.
Installation
You can install ga-project from PyPI using pip:
pip install ga-project
Example Workflow
The primary way to use this library is by configuring and running an experiment pipeline, as demonstrated in notebooks/example_usage.ipynb. This automates grid searching, model training, and evaluation.
Here is a simplified example of the core logic:
import ml_grid
from ml_grid.pipeline import main_ga
from ml_grid.model_classes_ga import (
logisticRegressionModelGenerator,
randomForestModelGenerator,
XGBoostModelGenerator,
)
# 1. Define experiment parameters
input_csv_path = "synthetic_data_for_testing.csv"
base_project_dir = "HFE_GA_experiments/my_first_run/"
# 2. Define the pool of base models for the Genetic Algorithm
model_list = [
logisticRegressionModelGenerator,
randomForestModelGenerator,
XGBoostModelGenerator,
]
# 3. Set hyperparameters for this specific run
# In a full run, this is typically iterated from a grid search
hyperparameters = {
'population_size': 50,
'n_generations': 20,
'mutation_rate': 0.2,
'crossover_rate': 0.8,
}
# 4. Configure and run the experiment pipeline
ml_grid_object = ml_grid.pipeline.data.pipe(
input_csv_path=input_csv_path,
base_project_dir=base_project_dir,
local_param_dict=hyperparameters,
config_dict={"modelFuncList": model_list},
)
# 5. Execute the Genetic Algorithm
main_ga.run(ml_grid_object, local_param_dict=hyperparameters).execute()
Getting Started
User Guide
Developer Guide
Project Information
API Reference