plot_global_importance

Global importance analysis plotting module for ML results analysis. This module trains a meta-model on the experimental parameters to determine which settings have the most significant impact on the target metric.

Classes

GlobalImportancePlotter

Initialize the plotter.

Module Contents

class plot_global_importance.GlobalImportancePlotter(data: pandas.DataFrame)[source]

Initialize the plotter.

Parameters:: data – Results DataFrame, must contain columns for experimental parameters and performance metrics.

data[source]

clean_data[source]

logger[source]

feature_categories = ['age', 'sex', 'bmi', 'ethnicity', 'bloods', 'diagnostic_order', 'drug_order', 'annotation_n',...[source]

pipeline_categorical_params = ['resample', 'scale', 'param_space_size', 'percent_missing'][source]

pipeline_continuous_params = ['nb_size', 'X_train_size', 'X_test_orig_size', 'X_test_size', 'n_fits', 't_fits', 'run_time'][source]

algorithm_col = 'method_name'[source]

plot_global_importance(metric: str = 'auc', top_n: int = 30, figsize: Tuple[int, int] = (12, 10)) → None[source]

Trains a model to predict a metric from experimental parameters and plots importances.

This method trains a RandomForestRegressor on the various pipeline and algorithm parameters to predict the outcome of a given performance metric. parameters and plots the resulting feature importances.

Parameters:

metric (str, optional) – The target metric to predict. Defaults to ‘auc’.
top_n (int, optional) – The number of top important features to plot. Defaults to 30.
figsize (Tuple[int, int], optional) – The figure size for the plot. Defaults to (12, 10).