plot_features

Feature analysis and importance plotting module for ML results analysis. Focuses on feature usage and impact on performance, with outcome stratification.

Attributes

`MAX_OUTCOMES_FOR_STRATIFIED_PLOT`
`MAX_FEATURES_FOR_ANALYSIS`
`MAX_FEATURES_FOR_UPSET`
`UpSet`

Classes

FeatureAnalysisPlotter

Initializes the feature analysis plotter.

Module Contents

plot_features.MAX_OUTCOMES_FOR_STRATIFIED_PLOT = 10[source]

plot_features.MAX_FEATURES_FOR_ANALYSIS = 500[source]

plot_features.MAX_FEATURES_FOR_UPSET = 40[source]

plot_features.UpSet = None[source]

class plot_features.FeatureAnalysisPlotter(data: pandas.DataFrame)[source]

Initializes the feature analysis plotter.

Parameters:: data (pd.DataFrame) – Results DataFrame, which must contain a ‘decoded_features’ column.
Raises:: ValueError – If the ‘decoded_features’ column is not found in the data.

data[source]

clean_data[source]

feature_df[source]

plot_feature_usage_frequency(top_n: int = 20, stratify_by_outcome: bool = False, outcomes_to_plot: List[str] | None = None, figsize: Tuple[int, int] | None = None)[source]

Plots the frequency of each feature’s usage in successful runs.

Parameters:

top_n (int, optional) – The number of most frequent features to plot. Defaults to 20.
stratify_by_outcome (bool, optional) – If True, creates separate plots for each outcome. Defaults to False.
outcomes_to_plot (Optional[List[str]], optional) – A list of specific outcomes to plot (if stratified). Defaults to None.
figsize (Optional[Tuple[int, int]], optional) – The figure size for the plot. If None, a default is calculated. Defaults to None.

plot_feature_performance_impact(metric: str = 'auc', outcomes: List[str] | None = None, top_n: int = 20, min_usage: int = 5, top_n_features_to_consider: int = MAX_FEATURES_FOR_ANALYSIS, figsize_per_outcome: Tuple[int, int] = (10, 8))[source]

Plots the impact of features on a performance metric for each outcome.

Impact is calculated as: (Mean metric of runs WITH the feature) - (Mean metric of runs WITHOUT the feature)

Parameters:

metric (str, optional) – The performance metric to evaluate. Defaults to ‘auc’.
outcomes (Optional[List[str]], optional) – A list of outcome variables to plot. If None, all are plotted. Defaults to None.
top_n (int, optional) – The number of top positive and negative impacting features to show. Defaults to 20.
min_usage (int, optional) – The minimum number of times a feature must be used to be included. Defaults to 5.
top_n_features_to_consider (int, optional) – The max number of most frequent features to analyze for impact. Defaults to 500.
figsize_per_outcome (Tuple[int, int], optional) – The figure size for each individual outcome plot. Defaults to (10, 8).

plot_feature_metric_correlation(metric: str = 'auc', outcomes: List[str] | None = None, top_n: int = 20, min_usage: int = 5, top_n_features_to_consider: int = MAX_FEATURES_FOR_ANALYSIS, figsize_per_outcome: Tuple[int, int] = (10, 8))[source]

Plots the correlation between feature presence and a performance metric.

This shows which features, when present, are most correlated with higher or lower performance for a given outcome.

Parameters:

metric (str, optional) – The performance metric to correlate with. Defaults to ‘auc’.
outcomes (Optional[List[str]], optional) – A list of outcome variables to plot. If None, all are plotted. Defaults to None.
top_n (int, optional) – The number of top positive and negative correlated features to show. Defaults to 20.
min_usage (int, optional) – The minimum number of times a feature must be used to be included. Defaults to 5.
top_n_features_to_consider (int, optional) – The max number of most frequent features to analyze for correlation. Defaults to 500.
figsize_per_outcome (Tuple[int, int], optional) – The figure size for each individual outcome plot. Defaults to (10, 8).

plot_feature_set_intersections(top_n_sets: int = 10, min_subset_size: int = 5, stratify_by_outcome: bool = False, max_features_for_upset: int = MAX_FEATURES_FOR_UPSET, figsize: Tuple[int, int] = (12, 7))[source]

Plots the intersections of feature sets using an UpSet plot.

This helps visualize which combinations of features are most frequently used together.

Parameters:

top_n_sets (int, optional) – The number of most frequent feature set intersections to plot. Defaults to 10.
min_subset_size (int, optional) – The minimum number of models a feature set must appear in to be plotted. Defaults to 5.
stratify_by_outcome (bool, optional) – If True, creates a separate plot for each outcome variable. Defaults to False.
max_features_for_upset (int, optional) – The max number of most frequent features to include in the UpSet plot matrix. Defaults to 40.
figsize (Tuple[int, int], optional) – The figure size for the plot. Defaults to (12, 7).