plot_features ============= .. py:module:: plot_features .. autoapi-nested-parse:: Feature analysis and importance plotting module for ML results analysis. Focuses on feature usage and impact on performance, with outcome stratification. Attributes ---------- .. autoapisummary:: plot_features.MAX_OUTCOMES_FOR_STRATIFIED_PLOT plot_features.MAX_FEATURES_FOR_ANALYSIS plot_features.MAX_FEATURES_FOR_UPSET plot_features.UpSet Classes ------- .. autoapisummary:: plot_features.FeatureAnalysisPlotter Module Contents --------------- .. py:data:: MAX_OUTCOMES_FOR_STRATIFIED_PLOT :value: 10 .. py:data:: MAX_FEATURES_FOR_ANALYSIS :value: 500 .. py:data:: MAX_FEATURES_FOR_UPSET :value: 40 .. py:data:: UpSet :value: None .. py:class:: FeatureAnalysisPlotter(data: pandas.DataFrame) Initializes the feature analysis plotter. :param data: Results DataFrame, which must contain a 'decoded_features' column. :type data: pd.DataFrame :raises ValueError: If the 'decoded_features' column is not found in the data. .. py:attribute:: data .. py:attribute:: clean_data .. py:attribute:: feature_df .. py:method:: plot_feature_usage_frequency(top_n: int = 20, stratify_by_outcome: bool = False, outcomes_to_plot: Optional[List[str]] = None, figsize: Optional[Tuple[int, int]] = None) Plots the frequency of each feature's usage in successful runs. :param top_n: The number of most frequent features to plot. Defaults to 20. :type top_n: int, optional :param stratify_by_outcome: If True, creates separate plots for each outcome. Defaults to False. :type stratify_by_outcome: bool, optional :param outcomes_to_plot: A list of specific outcomes to plot (if stratified). Defaults to None. :type outcomes_to_plot: Optional[List[str]], optional :param figsize: The figure size for the plot. If None, a default is calculated. Defaults to None. :type figsize: Optional[Tuple[int, int]], optional .. py:method:: plot_feature_performance_impact(metric: str = 'auc', outcomes: Optional[List[str]] = None, top_n: int = 20, min_usage: int = 5, top_n_features_to_consider: int = MAX_FEATURES_FOR_ANALYSIS, figsize_per_outcome: Tuple[int, int] = (10, 8)) Plots the impact of features on a performance metric for each outcome. Impact is calculated as: (Mean metric of runs WITH the feature) - (Mean metric of runs WITHOUT the feature) :param metric: The performance metric to evaluate. Defaults to 'auc'. :type metric: str, optional :param outcomes: A list of outcome variables to plot. If None, all are plotted. Defaults to None. :type outcomes: Optional[List[str]], optional :param top_n: The number of top positive and negative impacting features to show. Defaults to 20. :type top_n: int, optional :param min_usage: The minimum number of times a feature must be used to be included. Defaults to 5. :type min_usage: int, optional :param top_n_features_to_consider: The max number of most frequent features to analyze for impact. Defaults to 500. :type top_n_features_to_consider: int, optional :param figsize_per_outcome: The figure size for each individual outcome plot. Defaults to (10, 8). :type figsize_per_outcome: Tuple[int, int], optional .. py:method:: plot_feature_metric_correlation(metric: str = 'auc', outcomes: Optional[List[str]] = None, top_n: int = 20, min_usage: int = 5, top_n_features_to_consider: int = MAX_FEATURES_FOR_ANALYSIS, figsize_per_outcome: Tuple[int, int] = (10, 8)) Plots the correlation between feature presence and a performance metric. This shows which features, when present, are most correlated with higher or lower performance for a given outcome. :param metric: The performance metric to correlate with. Defaults to 'auc'. :type metric: str, optional :param outcomes: A list of outcome variables to plot. If None, all are plotted. Defaults to None. :type outcomes: Optional[List[str]], optional :param top_n: The number of top positive and negative correlated features to show. Defaults to 20. :type top_n: int, optional :param min_usage: The minimum number of times a feature must be used to be included. Defaults to 5. :type min_usage: int, optional :param top_n_features_to_consider: The max number of most frequent features to analyze for correlation. Defaults to 500. :type top_n_features_to_consider: int, optional :param figsize_per_outcome: The figure size for each individual outcome plot. Defaults to (10, 8). :type figsize_per_outcome: Tuple[int, int], optional .. py:method:: plot_feature_set_intersections(top_n_sets: int = 10, min_subset_size: int = 5, stratify_by_outcome: bool = False, max_features_for_upset: int = MAX_FEATURES_FOR_UPSET, figsize: Tuple[int, int] = (12, 7)) Plots the intersections of feature sets using an UpSet plot. This helps visualize which combinations of features are most frequently used together. :param top_n_sets: The number of most frequent feature set intersections to plot. Defaults to 10. :type top_n_sets: int, optional :param min_subset_size: The minimum number of models a feature set must appear in to be plotted. Defaults to 5. :type min_subset_size: int, optional :param stratify_by_outcome: If True, creates a separate plot for each outcome variable. Defaults to False. :type stratify_by_outcome: bool, optional :param max_features_for_upset: The max number of most frequent features to include in the UpSet plot matrix. Defaults to 40. :type max_features_for_upset: int, optional :param figsize: The figure size for the plot. Defaults to (12, 7). :type figsize: Tuple[int, int], optional