ml_grid.pipeline.column_names

Functions

get_pertubation_columns(→ Tuple[List[str], List[str]])

Categorizes columns and selects features based on configuration.

Module Contents

ml_grid.pipeline.column_names.get_pertubation_columns(all_df_columns: List[str], local_param_dict: Dict[str, Any], drop_term_list: List[str]) → Tuple[List[str], List[str]][source]

Categorizes columns and selects features based on configuration.

This function processes a list of all DataFrame columns, categorizing them into groups (e.g., bloods, annotations). It then selects which groups to include as features based on boolean flags in local_param_dict[‘data’]. It also identifies columns to drop based on keywords.

Parameters:

all_df_columns (List[str]) – A list of all column names in the DataFrame.
local_param_dict (Dict[str, Any]) – A dictionary of parameters for the current run, containing a ‘data’ sub-dictionary with boolean flags for each feature category.
drop_term_list (List[str]) – A list of substrings. Any column name containing one of these substrings will be marked for dropping.

Returns:

A tuple containing two lists:

A list of column names selected as features.
A list of column names identified to be dropped.

Return type:

Tuple[List[str], List[str]]