plot_feature_categories

Feature category analysis plotting module for ML results analysis. Focuses on visualizing the impact of including different data source categories on model performance.

Classes

FeatureCategoryPlotter

Initializes the FeatureCategoryPlotter.

Module Contents

class plot_feature_categories.FeatureCategoryPlotter(data: pandas.DataFrame)[source]

Initializes the FeatureCategoryPlotter.

Parameters:

data (pd.DataFrame) – Results DataFrame, must contain boolean columns for feature categories and performance metrics.

Raises:

ValueError – If no feature category columns are found in the data.

data[source]
clean_data[source]
feature_categories = ['age', 'sex', 'bmi', 'ethnicity', 'bloods', 'diagnostic_order', 'drug_order', 'annotation_n',...[source]
available_categories[source]
plot_category_performance_boxplots(metric: str = 'auc', figsize: Tuple[int, int] | None = None) None[source]

Creates box plots comparing performance when a feature category is included.

Parameters:
  • metric (str, optional) – The performance metric to plot. Defaults to ‘auc’.

  • figsize (Optional[Tuple[int, int]], optional) – Figure size for the plot. Defaults to None.

Raises:

ValueError – If the specified metric is not found in the data.

plot_category_impact_on_metric(metric: str = 'auc', figsize: Tuple[int, int] = (10, 8)) None[source]

Plots the impact of including each feature category on a metric.

Impact is calculated as: (Mean metric with category) - (Mean metric without category)

Parameters:
  • metric (str, optional) – The performance metric to evaluate. Defaults to ‘auc’.

  • figsize (Tuple[int, int], optional) – The figure size for the plot. Defaults to (10, 8).

Raises:

ValueError – If the specified metric is not found in the data.