plot_distributions
Distribution plotting module for ML results analysis. Focuses on metric distributions with outcome variable stratification.
Attributes
Classes
Initializes the DistributionPlotter. |
Functions
|
Plots correlation matrices of metrics, stratified by outcome variable. |
Module Contents
- class plot_distributions.DistributionPlotter(data: pandas.DataFrame, style: str = 'default')[source]
Initializes the DistributionPlotter.
- Parameters:
data (pd.DataFrame) – A DataFrame containing the experiment results.
style (str, optional) – The matplotlib style to use for plots. Defaults to ‘default’.
- plot_metric_distributions(metrics: List[str] | None = None, figsize: Tuple[int, int] = (15, 10), stratify_by_outcome: bool = False, outcomes_to_plot: List[str] | None = None)[source]
Plots distributions of key performance metrics.
- Parameters:
metrics (Optional[List[str]], optional) – A list of metric columns to plot. If None, uses a default list. Defaults to None.
figsize (Tuple[int, int], optional) – The figure size. Defaults to (15, 10).
stratify_by_outcome (bool, optional) – If True, creates separate plots for each outcome. Defaults to False.
outcomes_to_plot (Optional[List[str]], optional) – A list of specific outcomes to plot. If None, all are used. Defaults to None.
- Raises:
ValueError – If no specified metrics are found in the data.
- plot_comparative_distributions(metric: str = 'auc', outcomes_to_compare: List[str] | None = None, plot_type: str = 'overlay', figsize: Tuple[int, int] = (12, 6))[source]
Creates comparative distribution plots across different outcomes.
- Parameters:
metric (str, optional) – The metric to compare. Defaults to ‘auc’.
outcomes_to_compare (Optional[List[str]], optional) – A list of specific outcomes to compare. If None, all are used. Defaults to None.
plot_type (str, optional) – The type of plot to generate: ‘overlay’, ‘subplot’, or ‘violin’. Defaults to ‘overlay’.
figsize (Tuple[int, int], optional) – The figure size. Defaults to (12, 6).
- Raises:
ValueError – If ‘outcome_variable’ or the specified metric is not found, or if an invalid plot_type is provided.
- plot_distribution_summary_table(metrics: List[str] = None, outcomes_to_include: List[str] | None = None) pandas.DataFrame [source]
Creates a summary table of distribution statistics by outcome.
- Parameters:
- Returns:
A DataFrame containing the summary statistics.
- Return type:
pd.DataFrame
- Raises:
ValueError – If ‘outcome_variable’ column is not found.
- plot_distribution_heatmap(metrics: List[str] = None, stat: str = 'mean', figsize: Tuple[int, int] = (10, 6))[source]
Creates a heatmap of distribution statistics across outcomes and metrics.
- Parameters:
metrics (Optional[List[str]], optional) – A list of metrics to include. If None, uses a default list. Defaults to None.
stat (str, optional) – The statistic to display (‘mean’, ‘std’, ‘median’, ‘min’, ‘max’). Defaults to ‘mean’.
figsize (Tuple[int, int], optional) – The figure size. Defaults to (10, 6).
- Raises:
ValueError – If ‘outcome_variable’ column is not found.
- plot_distributions.plot_metric_correlation_by_outcome(data: pandas.DataFrame, outcomes_to_plot: List[str] | None = None, figsize: Tuple[int, int] = (15, 10))[source]
Plots correlation matrices of metrics, stratified by outcome variable.
- Parameters:
- Raises:
ValueError – If ‘outcome_variable’ column is not found.