ml_grid.util.global_params

Global parameters for the ml_grid project.

This module defines a singleton class GlobalParameters to hold configuration settings that are accessible throughout the application. It also includes a custom scoring function for ROC AUC that handles cases with a single class.

Attributes

global_parameters

Classes

GlobalParameters

Initializes the GlobalParameters instance.

Functions

custom_roc_auc_score(→ float)

Calculates ROC AUC score, handling cases with only one class in y_true.

Module Contents

ml_grid.util.global_params.custom_roc_auc_score(y_true: numpy.ndarray, y_pred: numpy.ndarray) → float[source]

Calculates ROC AUC score, handling cases with only one class in y_true.

If y_true contains fewer than two unique classes, ROC AUC is undefined. In such cases, this function returns np.nan.

Parameters:

y_true (np.ndarray) – True binary labels.
y_pred (np.ndarray) – Target scores.

Returns:

The ROC AUC score, or np.nan if the score is undefined.

Return type:

float

class ml_grid.util.global_params.GlobalParameters(debug_level: int = 0, knn_n_jobs: int = -1)[source]

Initializes the GlobalParameters instance.

This method sets the default values for all global parameters. The _initialized flag prevents re-initialization on subsequent calls.

Parameters:

debug_level (int, optional) – The initial debug level. Defaults to 0.
knn_n_jobs (int, optional) – The number of jobs for KNN. Defaults to -1.

debug_level: int[source]: The verbosity level for debugging. Not widely used. Defaults to 0.

knn_n_jobs: int[source]: The number of parallel jobs to run for KNN algorithms. -1 means using all available processors. Defaults to -1.

verbose: int[source]: Controls the verbosity of output during the pipeline run. Higher values produce more detailed logs. Defaults to 0.

rename_cols: bool[source]: If True, renames DataFrame columns to remove special characters (e.g., ‘[, ], <’) that can cause issues with some models like XGBoost. Defaults to True.

error_raise: bool[source]: If True, the pipeline will stop and raise an exception if an error occurs during model training or evaluation. If False, it will log the error and continue. Defaults to False.

random_grid_search: bool[source]: If True and bayessearch is False, uses RandomizedSearchCV instead of GridSearchCV. Defaults to False.

bayessearch: bool[source]: If True, uses BayesSearchCV from scikit-optimize for hyperparameter tuning, which can be more efficient than grid or random search. Defaults to True.

sub_sample_param_space_pct: float[source]: The percentage of the total parameter space to sample when using RandomizedSearchCV. For example, 0.1 means 10% of the combinations will be tried. Defaults to 0.0005.

grid_n_jobs: int[source]: The number of jobs to run in parallel for hyperparameter search (GridSearchCV, RandomizedSearchCV, BayesSearchCV). -1 means using all available processors. Defaults to -1.

time_limit_param: List[int][source]: A parameter for future use, intended to set time limits on model fitting. Currently not implemented. Defaults to [3].

random_state_val: int[source]: A seed value for random number generation to ensure reproducibility across runs. Defaults to 1234.

n_jobs_model_val: int[source]: The number of parallel jobs for models that support it (e.g., RandomForest). -1 means using all available processors. Defaults to -1.

max_param_space_iter_value: int[source]: A hard limit on the number of parameter combinations to evaluate in RandomizedSearchCV or BayesSearchCV. Prevents excessively long run times. Defaults to 10.

store_models: bool[source]: Whether to save trained models to disk. Defaults to True.

metric_list: Dict[str, str | Callable][source]: A dictionary of scoring metrics to evaluate models during cross-validation. Keys are metric names and values are scikit-learn scorer strings or callable objects.

use_embedding: bool[source]: Whether to use embedding for feature transformation. Defaults to False.

embedding_method: str[source]: The embedding method to use (e.g., ‘pca’, ‘svd’). Defaults to None.

embedding_dim: int[source]: The dimensionality of the embedding space. Defaults to None.

scale_features_before_embedding: bool[source]: Whether to scale features before applying embedding. Defaults to False.

cache_embeddings: bool[source]: Whether to cache computed embeddings for reuse. Defaults to False.

update_parameters(**kwargs: Any) → None[source]

Updates global parameters at runtime.

Parameters:: **kwargs (Any) – Key-value pairs of parameters to update.
Raises:: AttributeError – If a key in kwargs is not a valid parameter.

ml_grid.util.global_params.global_parameters[source]