# Configuration Guide

This guide explains how to customize your experiments. The project uses a layered configuration system, which gives you flexibility in how you define settings. The order of precedence is:

1.  **Runtime Arguments** (Highest precedence): Parameters passed directly when initializing `global_parameters` in a script.
2.  **`config.yml` File**: A central YAML file in your project root for most customizations.
3.  **Hardcoded Defaults** (Lowest precedence): The default values set within the package source code.

---

## The `config.yml` File

This is the **recommended method** for most configuration. It is safe from being overwritten by package updates and keeps all your settings in one place.

1.  **Create the File**: Copy the `config.yml.example` from the repository root to a new file named `config.yml`.
2.  **Edit**: Uncomment and change the parameters you wish to modify. Any parameter you don't specify will use its default value.

The `config.yml` is split into three main sections:

### 1. `global_params` (in `config.yml`)
These settings control the overall behavior of the experiment, such as file paths, number of iterations, and logging verbosity.

```yaml
global_params:
  # Path to your dataset
  input_csv_path: "data/my_dataset.csv"
  # Number of grid search iterations to run
  n_iter: 20
  # List of models to include in the base learner pool
  model_list: ["logisticRegression", "randomForest", "XGBoost"]
  # Verbosity level for console output
  verbose: 2
  # Number of parallel jobs for grid search
  grid_n_jobs: 8
  # The root directory for saving project outputs
  base_project_dir: "HFE_GA_experiments/"
  # Use a smaller, faster grid for testing and debugging
  testing: False
  # Number of rows to sample from the dataset for quick tests (0 = use all)
  test_sample_n: 0
```

### 2. `ga_params`
These control the core genetic algorithm process.
```yaml
ga_params:
  nb_params: [8, 16]       # Num base learners per ensemble
  pop_params: [50]         # Population size
  g_params: [100]          # Num generations
```

### 3. `grid_params`
This defines the hyperparameter search space for each grid search iteration. You can override entire lists or specific values.
```yaml
grid_params:
  weighted: ["unweighted"] # Only use unweighted for a faster run
  resample: ["undersample", None]
  corr: [0.95]
```

---

## Programmatic Configuration (In Scripts/Notebooks)

For quick tests or dynamic settings, you can override any parameter at runtime by passing it as a keyword argument to `global_parameters`. These arguments will take precedence over both the `config.yml` file and the hardcoded defaults.

```python
from tqdm import tqdm
from ml_grid.util.global_params import global_parameters
from ml_grid.util.grid_param_space_ga import Grid
from ml_grid.pipeline import data, main_ga

# This will load from config.yml first, then apply the overrides below
global_params = global_parameters(
    config_path='config.yml',
    input_csv_path="data/another_dataset.csv", # Override path from config
    n_iter=5,                                  # Override n_iter for a quick run
    verbose=3                                  # Override verbosity
)

# The main loop is then executed as shown in the Quickstart section
grid = Grid(
    global_params=global_params,
    config_path='config.yml'
)

for i in tqdm(range(global_params.n_iter)):
    local_param_dict = next(grid.settings_list_iterator)
    ml_grid_object = data.pipe(
        global_params=global_params,
        file_name=global_params.input_csv_path,
        local_param_dict=local_param_dict,
        base_project_dir=global_params.base_project_dir,
        param_space_index=i,
    )
    main_ga.run(ml_grid_object, local_param_dict=local_param_dict, global_params=global_params).execute()
```

This level of configuration gives you full control over the scope and depth of your hyperparameter search.