Usage Guide
This guide explains the primary ways to run experiments using the Ensemble Genetic Algorithm project.
Recommended Workflow: Command-Line with config.yml
The most straightforward and recommended way to run an experiment is from your terminal using the main.py script and a config.yml file. This approach keeps your configuration separate from the code and is ideal for most use cases.
Prepare Your Data: Ensure your input CSV meets the requirements outlined in the Data Preparation Guide.
Create a Configuration File: Copy the
config.yml.examplefile in the project root to a new file namedconfig.yml.Edit
config.yml: Open yourconfig.ymland customize the experiment. At a minimum, you should set:global_params.input_csv_path: Path to your dataset.global_params.n_iter: The number of grid search iterations.global_params.model_list: The base learners to use.ga_paramsandgrid_paramsto define your search space.
Here is a minimal example to get you started:
# In your new config.yml global_params: input_csv_path: "path/to/your/data.csv" n_iter: 10 # Start with a small number of iterations model_list: ["logisticRegression", "randomForest", "XGBoost"] ga_params: pop_params: [64] # Use a single population size to start
See the Configuration Guide for a full list of options.
Activate Your Environment:
source ga_env/bin/activate
Run the Experiment:
To run with the default
config.yml:python main.pyTo specify a different configuration file:
python main.py --config path/to/your/config.yml
To automatically evaluate the best model and generate all analysis plots after the run:
python main.py --config path/to/your/config.yml --evaluate --plot
The following diagram illustrates this workflow:
!main.py Workflow
Alternative: Programmatic Usage with the Example Notebook
For development, debugging, or a more interactive walkthrough, you can use the example_usage.ipynb notebook. This notebook provides a script-based implementation of the same workflow orchestrated by main.py. See the Example Usage Notebook Guide guide for a detailed breakdown of its contents.
To execute the notebook from the command line (useful for HPC environments), use the following command from the root of the repository:
jupyter nbconvert --to notebook --execute notebooks/example_usage.ipynb --output notebooks/example_usage_executed.ipynb
This command will:
Run the notebook
example_usage.ipynbusing the current Python environment.Save the executed version as
executed_example_usage.ipynbin the samenotebooks/directory.Preserve interactive IPython functionality (e.g., display, widgets) during execution.
📌 Note: Make sure the ga_env (or .venv) environment is activated before running this command:
source ga_env/bin/activate # Or .venv/bin/activate if installed manually
This ensures all required dependencies are available for successful execution.