Configuration Guide

# Configuration Guide

This guide explains how to customize your experiments. The project uses a layered configuration system, which gives you flexibility in how you define settings. The order of precedence is:

Runtime Arguments (Highest precedence): Parameters passed directly when initializing global_parameters in a script.
`config.yml` File: A central YAML file in your project root for most customizations.
Hardcoded Defaults (Lowest precedence): The default values set within the package source code.

—

## The config.yml File

This is the recommended method for most configuration. It is safe from being overwritten by package updates and keeps all your settings in one place.

Create the File: Copy the config.yml.example from the repository root to a new file named config.yml.
Edit: Uncomment and change the parameters you wish to modify. Any parameter you don’t specify will use its default value.

The config.yml is split into three main sections:

### 1. global_params (in config.yml) These settings control the overall behavior of the experiment, such as file paths, number of iterations, and logging verbosity.

```yaml global_params:

# Path to your dataset input_csv_path: “data/my_dataset.csv” # Number of grid search iterations to run n_iter: 20 # List of models to include in the base learner pool model_list: [“logisticRegression”, “randomForest”, “XGBoost”] # Verbosity level for console output verbose: 2 # Number of parallel jobs for grid search grid_n_jobs: 8 # The root directory for saving project outputs base_project_dir: “HFE_GA_experiments/” # Use a smaller, faster grid for testing and debugging testing: False # Number of rows to sample from the dataset for quick tests (0 = use all) test_sample_n: 0

```

### 2. ga_params These control the core genetic algorithm process. ```yaml ga_params:

nb_params: [8, 16] # Num base learners per ensemble pop_params: [50] # Population size g_params: [100] # Num generations

```

### 3. grid_params This defines the hyperparameter search space for each grid search iteration. You can override entire lists or specific values. ```yaml grid_params:

weighted: [“unweighted”] # Only use unweighted for a faster run resample: [“undersample”, None] corr: [0.95]

```

—

## Programmatic Configuration (In Scripts/Notebooks)

For quick tests or dynamic settings, you can override any parameter at runtime by passing it as a keyword argument to global_parameters. These arguments will take precedence over both the config.yml file and the hardcoded defaults.

```python from tqdm import tqdm from ml_grid.util.global_params import global_parameters from ml_grid.util.grid_param_space_ga import Grid from ml_grid.pipeline import data, main_ga

# This will load from config.yml first, then apply the overrides below global_params = global_parameters(

config_path=’config.yml’, input_csv_path=”data/another_dataset.csv”, # Override path from config n_iter=5, # Override n_iter for a quick run verbose=3 # Override verbosity

)

# The main loop is then executed as shown in the Quickstart section grid = Grid(

global_params=global_params, config_path=’config.yml’

)

for i in tqdm(range(global_params.n_iter)):

local_param_dict = next(grid.settings_list_iterator) ml_grid_object = data.pipe(

global_params=global_params, file_name=global_params.input_csv_path, local_param_dict=local_param_dict, base_project_dir=global_params.base_project_dir, param_space_index=i,

) main_ga.run(ml_grid_object, local_param_dict=local_param_dict, global_params=global_params).execute()

```

This level of configuration gives you full control over the scope and depth of your hyperparameter search.