ml_grid.util.global_params

Global parameters for the ml_grid project.

This module defines a singleton class GlobalParameters to hold configuration settings that are accessible throughout the application. It also includes a custom scoring function for ROC AUC that handles cases with a single class.

Attributes

global_parameters

Classes

GlobalParameters

Initializes the GlobalParameters instance.

Functions

custom_roc_auc_score(→ float)

Calculates ROC AUC score, handling cases with only one class in y_true.

Module Contents

ml_grid.util.global_params.custom_roc_auc_score(y_true: numpy.ndarray, y_pred: numpy.ndarray) float[source]

Calculates ROC AUC score, handling cases with only one class in y_true.

If y_true contains fewer than two unique classes, ROC AUC is undefined. In such cases, this function returns np.nan.

Parameters:
  • y_true (np.ndarray) – True binary labels.

  • y_pred (np.ndarray) – Target scores.

Returns:

The ROC AUC score, or np.nan if the score is undefined.

Return type:

float

class ml_grid.util.global_params.GlobalParameters(debug_level: int = 0, knn_n_jobs: int = -1)[source]

Initializes the GlobalParameters instance.

This method sets the default values for all global parameters. The _initialized flag prevents re-initialization on subsequent calls.

Parameters:
  • debug_level (int, optional) – The initial debug level. Defaults to 0.

  • knn_n_jobs (int, optional) – The number of jobs for KNN. Defaults to -1.

debug_level: int[source]

The verbosity level for debugging. Not widely used. Defaults to 0.

knn_n_jobs: int[source]

The number of parallel jobs to run for KNN algorithms. -1 means using all available processors. Defaults to -1.

verbose: int[source]

Controls the verbosity of output during the pipeline run. Higher values produce more detailed logs. Defaults to 0.

rename_cols: bool[source]

If True, renames DataFrame columns to remove special characters (e.g., ‘[, ], <’) that can cause issues with some models like XGBoost. Defaults to True.

error_raise: bool[source]

If True, the pipeline will stop and raise an exception if an error occurs during model training or evaluation. If False, it will log the error and continue. Defaults to False.

If True and bayessearch is False, uses RandomizedSearchCV instead of GridSearchCV. Defaults to False.

bayessearch: bool[source]

If True, uses BayesSearchCV from scikit-optimize for hyperparameter tuning, which can be more efficient than grid or random search. Defaults to True.

sub_sample_param_space_pct: float[source]

The percentage of the total parameter space to sample when using RandomizedSearchCV. For example, 0.1 means 10% of the combinations will be tried. Defaults to 0.0005.

grid_n_jobs: int[source]

The number of jobs to run in parallel for hyperparameter search (GridSearchCV, RandomizedSearchCV, BayesSearchCV). -1 means using all available processors. Defaults to -1.

time_limit_param: List[int][source]

A parameter for future use, intended to set time limits on model fitting. Currently not implemented. Defaults to [3].

random_state_val: int[source]

A seed value for random number generation to ensure reproducibility across runs. Defaults to 1234.

n_jobs_model_val: int[source]

The number of parallel jobs for models that support it (e.g., RandomForest). -1 means using all available processors. Defaults to -1.

max_param_space_iter_value: int[source]

A hard limit on the number of parameter combinations to evaluate in RandomizedSearchCV or BayesSearchCV. Prevents excessively long run times. Defaults to 10.

store_models: bool[source]

Whether to save trained models to disk. Defaults to True.

metric_list: Dict[str, str | Callable][source]

A dictionary of scoring metrics to evaluate models during cross-validation. Keys are metric names and values are scikit-learn scorer strings or callable objects.

update_parameters(**kwargs: Any) None[source]

Updates global parameters at runtime.

Parameters:

**kwargs (Any) – Key-value pairs of parameters to update.

Raises:

AttributeError – If a key in kwargs is not a valid parameter.

ml_grid.util.global_params.global_parameters[source]