plot_distributions ================== .. py:module:: plot_distributions .. autoapi-nested-parse:: Distribution plotting module for ML results analysis. Focuses on metric distributions with outcome variable stratification. Attributes ---------- .. autoapisummary:: plot_distributions.MAX_OUTCOMES_FOR_STRATIFIED_PLOT plot_distributions.MAX_OUTCOMES_FOR_HEATMAP Classes ------- .. autoapisummary:: plot_distributions.DistributionPlotter Functions --------- .. autoapisummary:: plot_distributions.plot_metric_correlation_by_outcome Module Contents --------------- .. py:data:: MAX_OUTCOMES_FOR_STRATIFIED_PLOT :value: 20 .. py:data:: MAX_OUTCOMES_FOR_HEATMAP :value: 25 .. py:class:: DistributionPlotter(data: pandas.DataFrame, style: str = 'default') Initializes the DistributionPlotter. :param data: A DataFrame containing the experiment results. :type data: pd.DataFrame :param style: The matplotlib style to use for plots. Defaults to 'default'. :type style: str, optional .. py:attribute:: data .. py:attribute:: clean_data .. py:attribute:: colors .. py:method:: plot_metric_distributions(metrics: Optional[List[str]] = None, figsize: Tuple[int, int] = (15, 10), stratify_by_outcome: bool = False, outcomes_to_plot: Optional[List[str]] = None) Plots distributions of key performance metrics. :param metrics: A list of metric columns to plot. If None, uses a default list. Defaults to None. :type metrics: Optional[List[str]], optional :param figsize: The figure size. Defaults to (15, 10). :type figsize: Tuple[int, int], optional :param stratify_by_outcome: If True, creates separate plots for each outcome. Defaults to False. :type stratify_by_outcome: bool, optional :param outcomes_to_plot: A list of specific outcomes to plot. If None, all are used. Defaults to None. :type outcomes_to_plot: Optional[List[str]], optional :raises ValueError: If no specified metrics are found in the data. .. py:method:: plot_comparative_distributions(metric: str = 'auc', outcomes_to_compare: Optional[List[str]] = None, plot_type: str = 'overlay', figsize: Tuple[int, int] = (12, 6)) Creates comparative distribution plots across different outcomes. :param metric: The metric to compare. Defaults to 'auc'. :type metric: str, optional :param outcomes_to_compare: A list of specific outcomes to compare. If None, all are used. Defaults to None. :type outcomes_to_compare: Optional[List[str]], optional :param plot_type: The type of plot to generate: 'overlay', 'subplot', or 'violin'. Defaults to 'overlay'. :type plot_type: str, optional :param figsize: The figure size. Defaults to (12, 6). :type figsize: Tuple[int, int], optional :raises ValueError: If 'outcome_variable' or the specified metric is not found, or if an invalid `plot_type` is provided. .. py:method:: plot_distribution_summary_table(metrics: List[str] = None, outcomes_to_include: Optional[List[str]] = None) -> pandas.DataFrame Creates a summary table of distribution statistics by outcome. :param metrics: A list of metrics to include in the summary. If None, uses a default list. Defaults to None. :type metrics: Optional[List[str]], optional :param outcomes_to_include: A list of specific outcomes to include. If None, all are used. Defaults to None. :type outcomes_to_include: Optional[List[str]], optional :returns: A DataFrame containing the summary statistics. :rtype: pd.DataFrame :raises ValueError: If 'outcome_variable' column is not found. .. py:method:: plot_distribution_heatmap(metrics: List[str] = None, stat: str = 'mean', figsize: Tuple[int, int] = (10, 6)) Creates a heatmap of distribution statistics across outcomes and metrics. :param metrics: A list of metrics to include. If None, uses a default list. Defaults to None. :type metrics: Optional[List[str]], optional :param stat: The statistic to display ('mean', 'std', 'median', 'min', 'max'). Defaults to 'mean'. :type stat: str, optional :param figsize: The figure size. Defaults to (10, 6). :type figsize: Tuple[int, int], optional :raises ValueError: If 'outcome_variable' column is not found. .. py:function:: plot_metric_correlation_by_outcome(data: pandas.DataFrame, outcomes_to_plot: Optional[List[str]] = None, figsize: Tuple[int, int] = (15, 10)) Plots correlation matrices of metrics, stratified by outcome variable. :param data: The results DataFrame. :type data: pd.DataFrame :param outcomes_to_plot: A list of specific outcomes to plot. If None, all are used. Defaults to None. :type outcomes_to_plot: Optional[List[str]], optional :param figsize: The figure size. Defaults to (15, 10). :type figsize: Tuple[int, int], optional :raises ValueError: If 'outcome_variable' column is not found.