ml_grid.pipeline.column_names
Functions
|
Categorizes columns and selects features based on configuration. |
Module Contents
- ml_grid.pipeline.column_names.get_pertubation_columns(all_df_columns: List[str], local_param_dict: Dict[str, Any], drop_term_list: List[str]) Tuple[List[str], List[str]][source]
Categorizes columns and selects features based on configuration.
This function processes a list of all DataFrame columns, categorizing them into groups (e.g., bloods, annotations). It then selects which groups to include as features based on boolean flags in local_param_dict[‘data’]. It also identifies columns to drop based on keywords.
- Parameters:
all_df_columns (List[str]) – A list of all column names in the DataFrame.
local_param_dict (Dict[str, Any]) – A dictionary of parameters for the current run, containing a ‘data’ sub-dictionary with boolean flags for each feature category.
drop_term_list (List[str]) – A list of substrings. Any column name containing one of these substrings will be marked for dropping.
- Returns:
- A tuple containing two lists:
A list of column names selected as features.
A list of column names identified to be dropped.
- Return type: