ml_grid.pipeline.main

Classes

run

Initializes the run class.

Module Contents

class ml_grid.pipeline.main.run(local_param_dict: Dict[str, Any], **kwargs)[source]

Initializes the run class.

This class takes the main data pipeline object and a dictionary of local parameters to set up and prepare for executing a series of hyperparameter searches across multiple machine learning models.

For hyperopt, this constructor can also accept keyword arguments to create the pipe object internally.

Parameters:

local_param_dict (Dict[str, Any]) – A dictionary of parameters for the current experimental run, such as param_space_size.
**kwargs – Keyword arguments to be passed to the pipe constructor. Expected keys include file_name, drop_term_list, model_class_dict, base_project_dir, experiment_dir, and outcome_var.

global_params: ml_grid.util.global_params.global_parameters[source]: A reference to the global parameters singleton instance.

verbose: int[source]: The verbosity level for logging, inherited from global parameters.

error_raise: bool[source]: A flag to control error handling. If True, exceptions will be raised.

ml_grid_object: ml_grid.pipeline.data.pipe[source]: The main data pipeline object, containing data and model configurations.

sub_sample_param_space_pct: float[source]: The percentage of the parameter space to sample in a randomized search.

parameter_space_size: str[source]: The size of the parameter space for base learners (e.g., ‘medium’, ‘xsmall’).

model_class_list: List[Any][source]: A list of instantiated model class objects to be evaluated in this run.

pg_list: List[int][source]: A list containing the calculated size of the parameter grid for each model.

mean_parameter_space_val: float[source]: The mean size of the parameter spaces across all models in the run.

sub_sample_parameter_val: int[source]: The calculated number of iterations for randomized search, based on sub_sample_param_space_pct.

arg_list: List[Tuple][source]: A list of argument tuples, one for each model, to be passed to the grid search function.

multiprocess: bool[source]: A flag to enable or disable multiprocessing for running grid searches in parallel.

local_param_dict: Dict[str, Any][source]: A dictionary of parameters for the current experimental run.

model_error_list: List[List[Any]][source]: A list to store details of any errors encountered during model training.

highest_score: float[source]: The highest score achieved across all successful model runs in the execute step.

logger[source]

project_score_save_class_instance[source]

execute_single_model(args: Tuple) → float[source]: Executes the grid search for a single model and returns its score. This method is designed to be called within a hyperopt objective function.

execute() → Tuple[List[List[Any]], float][source]

Executes the grid search for each model in the list.

This method iterates through the list of configured models and their parameter spaces, running a cross-validated grid search for each one. It captures any errors that occur during the process and returns a list of those errors along with the highest score achieved.

Returns:

A tuple containing:

A list of model errors, where each error is a list containing the algorithm instance, the exception, and the traceback.
The highest score achieved across all successful model runs.

Return type:

Tuple[List[List[Any]], float]