ml_grid.pipeline.data_constant_columns

Functions

remove_constant_columns(→ List[str])

Identifies columns in a DataFrame where all values are the same.

remove_constant_columns_with_debug(...)

Removes constant columns from training and testing datasets.

Module Contents

ml_grid.pipeline.data_constant_columns.remove_constant_columns(X: pandas.DataFrame, drop_list: List[str] | None = None, verbose: int = 1) List[str][source]

Identifies columns in a DataFrame where all values are the same.

Parameters:
  • X (pd.DataFrame) – DataFrame to check for constant columns.

  • drop_list (Optional[List[str]], optional) – A list of columns already marked for dropping. Defaults to None.

  • verbose (int, optional) – Controls the verbosity of logging. Defaults to 1.

Returns:

Updated list of columns to drop, including constant columns.

Return type:

List[str]

Raises:

AssertionError – If X is None.

ml_grid.pipeline.data_constant_columns.remove_constant_columns_with_debug(X_train: pandas.DataFrame | numpy.ndarray, X_test: pandas.DataFrame | numpy.ndarray, X_test_orig: pandas.DataFrame | numpy.ndarray, verbosity: int = 2) Tuple[pandas.DataFrame | numpy.ndarray, pandas.DataFrame | numpy.ndarray, pandas.DataFrame | numpy.ndarray][source]

Removes constant columns from training and testing datasets.

This function identifies columns that have zero variance in either the training or testing set and removes them from all provided datasets (X_train, X_test, X_test_orig). It supports both pandas DataFrames and NumPy arrays, including 3D arrays for time series data.

Parameters:
  • X_train (Union[pd.DataFrame, np.ndarray]) – Training feature data.

  • X_test (Union[pd.DataFrame, np.ndarray]) – Testing feature data.

  • X_test_orig (Union[pd.DataFrame, np.ndarray]) – Original (unsplit) testing feature data.

  • verbosity (int, optional) – Controls the verbosity of debug messages. Defaults to 2.

Returns:

A tuple containing the modified X_train, X_test, and X_test_orig datasets with constant columns removed.

Return type:

Tuple[Union[pd.DataFrame, np.ndarray], …]