ml_grid.pipeline.data_feature_methods

Classes

feature_methods

Initializes the feature_methods class.

Module Contents

class ml_grid.pipeline.data_feature_methods.feature_methods[source]

Initializes the feature_methods class.

getNfeaturesANOVAF(n: int, X_train: pandas.DataFrame | numpy.ndarray, y_train: pandas.Series) → List[str][source]

Gets the top n features based on the ANOVA F-value.

This method is for classification problems. The ANOVA F-value is calculated for each feature in X_train, and the resulting F-values are sorted in descending order. The top n features with the highest F-values are returned.

Parameters:

n (int) – The number of top features to return.
X_train (Union[pd.DataFrame, np.ndarray]) – Training data.
y_train (pd.Series) – Target variable.

Raises:

ValueError – If X_train is not a pandas DataFrame or numpy array, or if no features can be returned (e.g., all have NaN F-values).

Returns: List[str]: A list of column names for the top n features.

getNFeaturesMarkovBlanket(n: int, X_train: pandas.DataFrame, y_train: pandas.Series, classifier=None, num_simul: int = 30, cv: int = 5, svc_kernel: str = 'rbf', suppress_print: bool = True) → List[str][source]

Gets the top n features from the Markov Blanket (MB) using PyImpetus.

Parameters:

n (int) – The number of top features to retrieve.
X_train (pd.DataFrame) – The training input samples.
y_train (pd.Series) – The target values.
classifier – The classifier to use for feature selection. If None, defaults to SVC.
num_simul (int) – Number of simulations for stability selection in PyImpetus. Defaults to 30.
cv (int) – Number of cross-validation folds. Defaults to 5.
svc_kernel (str) – The kernel to be used by the SVC model. Defaults to “rbf”.
suppress_print (bool) – If True, suppresses stdout from the fit method. Defaults to True.

Raises:

TypeError – If X_train is not a pandas DataFrame.

Returns:

A list containing the names of the top n features from the Markov Blanket.

Return type:

List[str]