plot_features
Feature analysis and importance plotting module for ML results analysis. Focuses on feature usage and impact on performance, with outcome stratification.
Attributes
Classes
Initializes the feature analysis plotter. |
Module Contents
- class plot_features.FeatureAnalysisPlotter(data: pandas.DataFrame)[source]
Initializes the feature analysis plotter.
- Parameters:
data (pd.DataFrame) – Results DataFrame, which must contain a ‘decoded_features’ column.
- Raises:
ValueError – If the ‘decoded_features’ column is not found in the data.
- plot_feature_usage_frequency(top_n: int = 20, stratify_by_outcome: bool = False, outcomes_to_plot: List[str] | None = None, figsize: Tuple[int, int] | None = None)[source]
Plots the frequency of each feature’s usage in successful runs.
- Parameters:
top_n (int, optional) – The number of most frequent features to plot. Defaults to 20.
stratify_by_outcome (bool, optional) – If True, creates separate plots for each outcome. Defaults to False.
outcomes_to_plot (Optional[List[str]], optional) – A list of specific outcomes to plot (if stratified). Defaults to None.
figsize (Optional[Tuple[int, int]], optional) – The figure size for the plot. If None, a default is calculated. Defaults to None.
- plot_feature_performance_impact(metric: str = 'auc', outcomes: List[str] | None = None, top_n: int = 20, min_usage: int = 5, top_n_features_to_consider: int = MAX_FEATURES_FOR_ANALYSIS, figsize_per_outcome: Tuple[int, int] = (10, 8))[source]
Plots the impact of features on a performance metric for each outcome.
Impact is calculated as: (Mean metric of runs WITH the feature) - (Mean metric of runs WITHOUT the feature)
- Parameters:
metric (str, optional) – The performance metric to evaluate. Defaults to ‘auc’.
outcomes (Optional[List[str]], optional) – A list of outcome variables to plot. If None, all are plotted. Defaults to None.
top_n (int, optional) – The number of top positive and negative impacting features to show. Defaults to 20.
min_usage (int, optional) – The minimum number of times a feature must be used to be included. Defaults to 5.
top_n_features_to_consider (int, optional) – The max number of most frequent features to analyze for impact. Defaults to 500.
figsize_per_outcome (Tuple[int, int], optional) – The figure size for each individual outcome plot. Defaults to (10, 8).
- plot_feature_metric_correlation(metric: str = 'auc', outcomes: List[str] | None = None, top_n: int = 20, min_usage: int = 5, top_n_features_to_consider: int = MAX_FEATURES_FOR_ANALYSIS, figsize_per_outcome: Tuple[int, int] = (10, 8))[source]
Plots the correlation between feature presence and a performance metric.
This shows which features, when present, are most correlated with higher or lower performance for a given outcome.
- Parameters:
metric (str, optional) – The performance metric to correlate with. Defaults to ‘auc’.
outcomes (Optional[List[str]], optional) – A list of outcome variables to plot. If None, all are plotted. Defaults to None.
top_n (int, optional) – The number of top positive and negative correlated features to show. Defaults to 20.
min_usage (int, optional) – The minimum number of times a feature must be used to be included. Defaults to 5.
top_n_features_to_consider (int, optional) – The max number of most frequent features to analyze for correlation. Defaults to 500.
figsize_per_outcome (Tuple[int, int], optional) – The figure size for each individual outcome plot. Defaults to (10, 8).
- plot_feature_set_intersections(top_n_sets: int = 10, min_subset_size: int = 5, stratify_by_outcome: bool = False, max_features_for_upset: int = MAX_FEATURES_FOR_UPSET, figsize: Tuple[int, int] = (12, 7))[source]
Plots the intersections of feature sets using an UpSet plot.
This helps visualize which combinations of features are most frequently used together.
- Parameters:
top_n_sets (int, optional) – The number of most frequent feature set intersections to plot. Defaults to 10.
min_subset_size (int, optional) – The minimum number of models a feature set must appear in to be plotted. Defaults to 5.
stratify_by_outcome (bool, optional) – If True, creates a separate plot for each outcome variable. Defaults to False.
max_features_for_upset (int, optional) – The max number of most frequent features to include in the UpSet plot matrix. Defaults to 40.
figsize (Tuple[int, int], optional) – The figure size for the plot. Defaults to (12, 7).