Configuration Guide
This guide explains how to customize your experiments. The project uses a layered configuration system, which gives you flexibility in how you define settings. The order of precedence is:
Runtime Arguments (Highest precedence): Parameters passed directly when initializing
global_parametersin a script.config.ymlFile: A central YAML file in your project root for most customizations.Hardcoded Defaults (Lowest precedence): The default values set within the package source code.
The config.yml File
This is the recommended method for most configuration. It is safe from being overwritten by package updates and keeps all your settings in one place.
Create the File: Copy the
config.yml.examplefrom the repository root to a new file namedconfig.yml.Edit: Uncomment and change the parameters you wish to modify. Any parameter you don’t specify will use its default value.
The config.yml is split into three main sections:
1. global_params (in config.yml)
These settings control the overall behavior of the experiment, such as file paths, number of iterations, and logging verbosity.
global_params:
# Path to your dataset
input_csv_path: "data/my_dataset.csv"
# Number of grid search iterations to run
n_iter: 20
# List of models to include in the base learner pool
model_list: ["logisticRegression", "randomForest", "XGBoost"]
# Verbosity level for console output
verbose: 2
# Number of parallel jobs for grid search
grid_n_jobs: 8
# The root directory for saving project outputs
base_project_dir: "HFE_GA_experiments/"
# Use a smaller, faster grid for testing and debugging
testing: False
# Number of rows to sample from the dataset for quick tests (0 = use all)
test_sample_n: 0
2. ga_params
These control the core genetic algorithm process.
ga_params:
nb_params: [8, 16] # Num base learners per ensemble
pop_params: [50] # Population size
g_params: [100] # Num generations
3. grid_params
This defines the hyperparameter search space for each grid search iteration. You can override entire lists or specific values.
grid_params:
weighted: ["unweighted"] # Only use unweighted for a faster run
resample: ["undersample", None]
corr: [0.95]
Programmatic Configuration (In Scripts/Notebooks)
For quick tests or dynamic settings, you can override any parameter at runtime by passing it as a keyword argument to global_parameters. These arguments will take precedence over both the config.yml file and the hardcoded defaults.
from tqdm import tqdm
from ml_grid.util.global_params import global_parameters
from ml_grid.util.grid_param_space_ga import Grid
from ml_grid.pipeline import data, main_ga
# This will load from config.yml first, then apply the overrides below
global_params = global_parameters(
config_path='config.yml',
input_csv_path="data/another_dataset.csv", # Override path from config
n_iter=5, # Override n_iter for a quick run
verbose=3 # Override verbosity
)
# The main loop is then executed as shown in the Quickstart section
grid = Grid(
global_params=global_params,
config_path='config.yml'
)
for i in tqdm(range(global_params.n_iter)):
local_param_dict = next(grid.settings_list_iterator)
ml_grid_object = data.pipe(
global_params=global_params,
file_name=global_params.input_csv_path,
local_param_dict=local_param_dict,
base_project_dir=global_params.base_project_dir,
param_space_index=i,
)
main_ga.run(ml_grid_object, local_param_dict=local_param_dict, global_params=global_params).execute()
This level of configuration gives you full control over the scope and depth of your hyperparameter search.