plot_features
=============

.. py:module:: plot_features

.. autoapi-nested-parse::

   Feature analysis and importance plotting module for ML results analysis.
   Focuses on feature usage and impact on performance, with outcome stratification.


Attributes
----------

.. autoapisummary::

   plot_features.MAX_OUTCOMES_FOR_STRATIFIED_PLOT
   plot_features.MAX_FEATURES_FOR_ANALYSIS
   plot_features.MAX_FEATURES_FOR_UPSET
   plot_features.UpSet


Classes
-------

.. autoapisummary::

   plot_features.FeatureAnalysisPlotter


Module Contents
---------------

.. py:data:: MAX_OUTCOMES_FOR_STRATIFIED_PLOT
   :value: 10


.. py:data:: MAX_FEATURES_FOR_ANALYSIS
   :value: 500


.. py:data:: MAX_FEATURES_FOR_UPSET
   :value: 40


.. py:data:: UpSet
   :value: None


.. py:class:: FeatureAnalysisPlotter(data: pandas.DataFrame)

   Initializes the feature analysis plotter.

   :param data: Results DataFrame, which must contain a
                'decoded_features' column.
   :type data: pd.DataFrame

   :raises ValueError: If the 'decoded_features' column is not found in the data.


   .. py:attribute:: data


   .. py:attribute:: clean_data


   .. py:attribute:: feature_df


   .. py:method:: plot_feature_usage_frequency(top_n: int = 20, stratify_by_outcome: bool = False, outcomes_to_plot: Optional[List[str]] = None, figsize: Optional[Tuple[int, int]] = None)

      Plots the frequency of each feature's usage in successful runs.

      :param top_n: The number of most frequent features to plot.
                    Defaults to 20.
      :type top_n: int, optional
      :param stratify_by_outcome: If True, creates separate plots
                                  for each outcome. Defaults to False.
      :type stratify_by_outcome: bool, optional
      :param outcomes_to_plot: A list of specific
                               outcomes to plot (if stratified). Defaults to None.
      :type outcomes_to_plot: Optional[List[str]], optional
      :param figsize: The figure size for
                      the plot. If None, a default is calculated. Defaults to None.
      :type figsize: Optional[Tuple[int, int]], optional


   .. py:method:: plot_feature_performance_impact(metric: str = 'auc', outcomes: Optional[List[str]] = None, top_n: int = 20, min_usage: int = 5, top_n_features_to_consider: int = MAX_FEATURES_FOR_ANALYSIS, figsize_per_outcome: Tuple[int, int] = (10, 8))

      Plots the impact of features on a performance metric for each outcome.

      Impact is calculated as:
      (Mean metric of runs WITH the feature) - (Mean metric of runs WITHOUT the feature)

      :param metric: The performance metric to evaluate.
                     Defaults to 'auc'.
      :type metric: str, optional
      :param outcomes: A list of outcome variables
                       to plot. If None, all are plotted. Defaults to None.
      :type outcomes: Optional[List[str]], optional
      :param top_n: The number of top positive and negative
                    impacting features to show. Defaults to 20.
      :type top_n: int, optional
      :param min_usage: The minimum number of times a feature must
                        be used to be included. Defaults to 5.
      :type min_usage: int, optional
      :param top_n_features_to_consider: The max number of most
                                         frequent features to analyze for impact. Defaults to 500.
      :type top_n_features_to_consider: int, optional
      :param figsize_per_outcome: The figure size for
                                  each individual outcome plot. Defaults to (10, 8).
      :type figsize_per_outcome: Tuple[int, int], optional


   .. py:method:: plot_feature_metric_correlation(metric: str = 'auc', outcomes: Optional[List[str]] = None, top_n: int = 20, min_usage: int = 5, top_n_features_to_consider: int = MAX_FEATURES_FOR_ANALYSIS, figsize_per_outcome: Tuple[int, int] = (10, 8))

      Plots the correlation between feature presence and a performance metric.

      This shows which features, when present, are most correlated with higher or lower
      performance for a given outcome.

      :param metric: The performance metric to correlate with.
                     Defaults to 'auc'.
      :type metric: str, optional
      :param outcomes: A list of outcome variables
                       to plot. If None, all are plotted. Defaults to None.
      :type outcomes: Optional[List[str]], optional
      :param top_n: The number of top positive and negative
                    correlated features to show. Defaults to 20.
      :type top_n: int, optional
      :param min_usage: The minimum number of times a feature must
                        be used to be included. Defaults to 5.
      :type min_usage: int, optional
      :param top_n_features_to_consider: The max number of most
                                         frequent features to analyze for correlation. Defaults to 500.
      :type top_n_features_to_consider: int, optional
      :param figsize_per_outcome: The figure size for
                                  each individual outcome plot. Defaults to (10, 8).
      :type figsize_per_outcome: Tuple[int, int], optional


   .. py:method:: plot_feature_set_intersections(top_n_sets: int = 10, min_subset_size: int = 5, stratify_by_outcome: bool = False, max_features_for_upset: int = MAX_FEATURES_FOR_UPSET, figsize: Tuple[int, int] = (12, 7))

      Plots the intersections of feature sets using an UpSet plot.

      This helps visualize which combinations of features are most frequently used together.

      :param top_n_sets: The number of most frequent feature set
                         intersections to plot. Defaults to 10.
      :type top_n_sets: int, optional
      :param min_subset_size: The minimum number of models a feature
                              set must appear in to be plotted. Defaults to 5.
      :type min_subset_size: int, optional
      :param stratify_by_outcome: If True, creates a separate plot
                                  for each outcome variable. Defaults to False.
      :type stratify_by_outcome: bool, optional
      :param max_features_for_upset: The max number of most frequent
                                     features to include in the UpSet plot matrix. Defaults to 40.
      :type max_features_for_upset: int, optional
      :param figsize: The figure size for the plot.
                      Defaults to (12, 7).
      :type figsize: Tuple[int, int], optional