plot_feature_categories

Feature category analysis plotting module for ML results analysis. Focuses on visualizing the impact of including different data source categories on model performance.

Classes

FeatureCategoryPlotter

Initializes the FeatureCategoryPlotter.

Module Contents

class plot_feature_categories.FeatureCategoryPlotter(data: pandas.DataFrame)[source]

Initializes the FeatureCategoryPlotter.

Parameters:: data (pd.DataFrame) – Results DataFrame, must contain boolean columns for feature categories and performance metrics.
Raises:: ValueError – If no feature category columns are found in the data.

data[source]

clean_data[source]

feature_categories = ['age', 'sex', 'bmi', 'ethnicity', 'bloods', 'diagnostic_order', 'drug_order', 'annotation_n',...[source]

available_categories[source]

plot_category_performance_boxplots(metric: str = 'auc', figsize: Tuple[int, int] | None = None) → None[source]

Creates box plots comparing performance when a feature category is included.

Parameters:

metric (str, optional) – The performance metric to plot. Defaults to ‘auc’.
figsize (Optional[Tuple[int, int]], optional) – Figure size for the plot. Defaults to None.

Raises:

ValueError – If the specified metric is not found in the data.

plot_category_impact_on_metric(metric: str = 'auc', figsize: Tuple[int, int] = (10, 8)) → None[source]

Plots the impact of including each feature category on a metric.

Impact is calculated as: (Mean metric with category) - (Mean metric without category)

Parameters:

metric (str, optional) – The performance metric to evaluate. Defaults to ‘auc’.
figsize (Tuple[int, int], optional) – The figure size for the plot. Defaults to (10, 8).

Raises:

ValueError – If the specified metric is not found in the data.