ml_grid.pipeline.data_constant_columns
Functions
|
Identifies columns in a DataFrame where all values are the same. |
Removes constant columns from training and testing datasets. |
Module Contents
- ml_grid.pipeline.data_constant_columns.remove_constant_columns(X: pandas.DataFrame, drop_list: List[str] | None = None, verbose: int = 1) List[str] [source]
Identifies columns in a DataFrame where all values are the same.
- Parameters:
- Returns:
Updated list of columns to drop, including constant columns.
- Return type:
List[str]
- Raises:
AssertionError – If X is None.
- ml_grid.pipeline.data_constant_columns.remove_constant_columns_with_debug(X_train: pandas.DataFrame | numpy.ndarray, X_test: pandas.DataFrame | numpy.ndarray, X_test_orig: pandas.DataFrame | numpy.ndarray, verbosity: int = 2) Tuple[pandas.DataFrame | numpy.ndarray, pandas.DataFrame | numpy.ndarray, pandas.DataFrame | numpy.ndarray] [source]
Removes constant columns from training and testing datasets.
This function identifies columns that have zero variance in either the training or testing set and removes them from all provided datasets (X_train, X_test, X_test_orig). It supports both pandas DataFrames and NumPy arrays, including 3D arrays for time series data.
- Parameters:
X_train (Union[pd.DataFrame, np.ndarray]) – Training feature data.
X_test (Union[pd.DataFrame, np.ndarray]) – Testing feature data.
X_test_orig (Union[pd.DataFrame, np.ndarray]) – Original (unsplit) testing feature data.
verbosity (int, optional) – Controls the verbosity of debug messages. Defaults to 2.
- Returns:
A tuple containing the modified X_train, X_test, and X_test_orig datasets with constant columns removed.
- Return type:
Tuple[Union[pd.DataFrame, np.ndarray], …]