ml_grid.pipeline.data_clean_up
Classes
Initializes the clean_up_class. |
Module Contents
- class ml_grid.pipeline.data_clean_up.clean_up_class[source]
Initializes the clean_up_class.
- handle_duplicated_columns(X: pandas.DataFrame) pandas.DataFrame [source]
Drops duplicated columns from a DataFrame.
- Parameters:
X (pd.DataFrame) – DataFrame to drop duplicated columns from.
- Returns:
A copy of X with duplicated columns dropped.
- Return type:
pd.DataFrame
- Raises:
AssertionError – If X is None before or after processing.
- screen_non_float_types(X: pandas.DataFrame) None [source]
Screens and prints columns that are not of float or int type.
- Parameters:
X (pd.DataFrame) – The DataFrame to screen.
- handle_column_names(X: pandas.DataFrame) pandas.DataFrame [source]
Renames columns to remove characters unsupported by some ML models.
This function renames columns in a DataFrame (X) that contain characters like ‘[’, ‘]’, or ‘<’, which can cause issues with models like XGBoost. These characters are replaced with underscores.
The renaming is controlled by the self.rename_cols attribute.
- Parameters:
X (pd.DataFrame) – DataFrame with columns to be potentially renamed.
- Returns:
A copy of X with renamed columns if applicable.
- Return type:
pd.DataFrame