ml_grid.pipeline.data_feature_methods

Classes

feature_methods

Initializes the feature_methods class.

Module Contents

class ml_grid.pipeline.data_feature_methods.feature_methods[source]

Initializes the feature_methods class.

getNfeaturesANOVAF(n: int, X_train: pandas.DataFrame | numpy.ndarray, y_train: pandas.Series) List[str][source]

Gets the top n features based on the ANOVA F-value.

This method is for classification problems. The ANOVA F-value is calculated for each feature in X_train, and the resulting F-values are sorted in descending order. The top n features with the highest F-values are returned.

Parameters:
  • n (int) – The number of top features to return.

  • X_train (Union[pd.DataFrame, np.ndarray]) – Training data.

  • y_train (pd.Series) – Target variable.

Raises:

ValueError – If X_train is not a pandas DataFrame or numpy array, or if no features can be returned (e.g., all have NaN F-values).

Returns

List[str]: A list of column names for the top n features.

getNFeaturesMarkovBlanket(n: int, X_train: pandas.DataFrame, y_train: pandas.Series, num_simul: int = 30, cv: int = 5, svc_kernel: str = 'rbf') List[str][source]

Gets the top n features from the Markov Blanket (MB) using PyImpetus.

Parameters:
  • n (int) – The number of top features to retrieve.

  • X_train (pd.DataFrame) – The training input samples.

  • y_train (pd.Series) – The target values.

  • num_simul (int) – Number of simulations for stability selection in PyImpetus. Defaults to 30.

  • cv (int) – Number of cross-validation folds. Defaults to 5.

  • svc_kernel (str) – The kernel to be used by the SVC model. Defaults to “rbf”.

Raises:

TypeError – If X_train is not a pandas DataFrame.

Returns:

A list containing the names of the top n features from the Markov Blanket.

Return type:

List[str]