ml_grid.util.global_params
Global parameters for the ml_grid project.
This module defines a singleton class GlobalParameters to hold configuration settings that are accessible throughout the application. It also includes a custom scoring function for ROC AUC that handles cases with a single class.
Attributes
Classes
Initializes the GlobalParameters instance. |
Functions
|
Calculates ROC AUC score, handling cases with only one class in y_true. |
Module Contents
- ml_grid.util.global_params.custom_roc_auc_score(y_true: numpy.ndarray, y_pred: numpy.ndarray) float [source]
Calculates ROC AUC score, handling cases with only one class in y_true.
If y_true contains fewer than two unique classes, ROC AUC is undefined. In such cases, this function returns np.nan.
- Parameters:
y_true (np.ndarray) – True binary labels.
y_pred (np.ndarray) – Target scores.
- Returns:
The ROC AUC score, or np.nan if the score is undefined.
- Return type:
- class ml_grid.util.global_params.GlobalParameters(debug_level: int = 0, knn_n_jobs: int = -1)[source]
Initializes the GlobalParameters instance.
This method sets the default values for all global parameters. The _initialized flag prevents re-initialization on subsequent calls.
- Parameters:
- knn_n_jobs: int[source]
The number of parallel jobs to run for KNN algorithms. -1 means using all available processors. Defaults to -1.
- verbose: int[source]
Controls the verbosity of output during the pipeline run. Higher values produce more detailed logs. Defaults to 0.
- rename_cols: bool[source]
If True, renames DataFrame columns to remove special characters (e.g., ‘[, ], <’) that can cause issues with some models like XGBoost. Defaults to True.
- error_raise: bool[source]
If True, the pipeline will stop and raise an exception if an error occurs during model training or evaluation. If False, it will log the error and continue. Defaults to False.
- random_grid_search: bool[source]
If True and bayessearch is False, uses RandomizedSearchCV instead of GridSearchCV. Defaults to False.
- bayessearch: bool[source]
If True, uses BayesSearchCV from scikit-optimize for hyperparameter tuning, which can be more efficient than grid or random search. Defaults to True.
- sub_sample_param_space_pct: float[source]
The percentage of the total parameter space to sample when using RandomizedSearchCV. For example, 0.1 means 10% of the combinations will be tried. Defaults to 0.0005.
- grid_n_jobs: int[source]
The number of jobs to run in parallel for hyperparameter search (GridSearchCV, RandomizedSearchCV, BayesSearchCV). -1 means using all available processors. Defaults to -1.
- time_limit_param: List[int][source]
A parameter for future use, intended to set time limits on model fitting. Currently not implemented. Defaults to [3].
- random_state_val: int[source]
A seed value for random number generation to ensure reproducibility across runs. Defaults to 1234.
- n_jobs_model_val: int[source]
The number of parallel jobs for models that support it (e.g., RandomForest). -1 means using all available processors. Defaults to -1.
- max_param_space_iter_value: int[source]
A hard limit on the number of parameter combinations to evaluate in RandomizedSearchCV or BayesSearchCV. Prevents excessively long run times. Defaults to 10.
- metric_list: Dict[str, str | Callable][source]
A dictionary of scoring metrics to evaluate models during cross-validation. Keys are metric names and values are scikit-learn scorer strings or callable objects.
- update_parameters(**kwargs: Any) None [source]
Updates global parameters at runtime.
- Parameters:
**kwargs (Any) – Key-value pairs of parameters to update.
- Raises:
AttributeError – If a key in kwargs is not a valid parameter.