ml_grid.pipeline.data_feature_methods
Classes
Initializes the feature_methods class. |
Module Contents
- class ml_grid.pipeline.data_feature_methods.feature_methods[source]
Initializes the feature_methods class.
- getNfeaturesANOVAF(n: int, X_train: pandas.DataFrame | numpy.ndarray, y_train: pandas.Series) List[str] [source]
Gets the top n features based on the ANOVA F-value.
This method is for classification problems. The ANOVA F-value is calculated for each feature in X_train, and the resulting F-values are sorted in descending order. The top n features with the highest F-values are returned.
- Parameters:
n (int) – The number of top features to return.
X_train (Union[pd.DataFrame, np.ndarray]) – Training data.
y_train (pd.Series) – Target variable.
- Raises:
ValueError – If X_train is not a pandas DataFrame or numpy array, or if no features can be returned (e.g., all have NaN F-values).
- Returns
List[str]: A list of column names for the top n features.
- getNFeaturesMarkovBlanket(n: int, X_train: pandas.DataFrame, y_train: pandas.Series, num_simul: int = 30, cv: int = 5, svc_kernel: str = 'rbf') List[str] [source]
Gets the top n features from the Markov Blanket (MB) using PyImpetus.
- Parameters:
n (int) – The number of top features to retrieve.
X_train (pd.DataFrame) – The training input samples.
y_train (pd.Series) – The target values.
num_simul (int) – Number of simulations for stability selection in PyImpetus. Defaults to 30.
cv (int) – Number of cross-validation folds. Defaults to 5.
svc_kernel (str) – The kernel to be used by the SVC model. Defaults to “rbf”.
- Raises:
TypeError – If X_train is not a pandas DataFrame.
- Returns:
A list containing the names of the top n features from the Markov Blanket.
- Return type:
List[str]