ml_grid.pipeline.data_clean_up

Classes

clean_up_class

Initializes the clean_up_class.

Module Contents

class ml_grid.pipeline.data_clean_up.clean_up_class[source]

Initializes the clean_up_class.

global_params[source]
verbose = 0[source]
rename_cols = True[source]
handle_duplicated_columns(X: pandas.DataFrame) pandas.DataFrame[source]

Drops duplicated columns from a DataFrame.

Parameters:

X (pd.DataFrame) – DataFrame to drop duplicated columns from.

Returns:

A copy of X with duplicated columns dropped.

Return type:

pd.DataFrame

Raises:

AssertionError – If X is None before or after processing.

screen_non_float_types(X: pandas.DataFrame) None[source]

Screens and prints columns that are not of float or int type.

Parameters:

X (pd.DataFrame) – The DataFrame to screen.

handle_column_names(X: pandas.DataFrame) pandas.DataFrame[source]

Renames columns to remove characters unsupported by some ML models.

This function renames columns in a DataFrame (X) that contain characters like ‘[’, ‘]’, or ‘<’, which can cause issues with models like XGBoost. These characters are replaced with underscores.

The renaming is controlled by the self.rename_cols attribute.

Parameters:

X (pd.DataFrame) – DataFrame with columns to be potentially renamed.

Returns:

A copy of X with renamed columns if applicable.

Return type:

pd.DataFrame