summarize_results

Module for creating tabular summaries from ML results data.

Classes

ResultsSummarizer

Initializes the summarizer.

Module Contents

class summarize_results.ResultsSummarizer(data: pandas.DataFrame)[source]

Initializes the summarizer.

Parameters:

data (pd.DataFrame) – Aggregated results DataFrame.

Raises:

ValueError – If the input data is not a non-empty pandas DataFrame.

data[source]
clean_data[source]
get_best_model_per_outcome(metric: str = 'auc') pandas.DataFrame[source]

Finds the best model for each outcome and expands the feature list.

This method identifies the single best-performing model run for each outcome variable based on the specified metric. It then transforms the ‘decoded_features’ list into a set of boolean columns, where each new column represents a feature and its value indicates whether that feature was used in the best model run.

Parameters:

metric (str, optional) – The performance metric to use for determining the “best” model. Defaults to ‘auc’.

Returns:

A DataFrame containing the best model run for each outcome, with additional boolean columns for each feature.

Return type:

pd.DataFrame