ml_grid.pipeline.read_in
Classes
Initializes the read class and loads the data. |
|
Initializes the read_sample class and loads a data sample. |
Module Contents
- class ml_grid.pipeline.read_in.read(input_filename: str, use_polars: bool = False)[source]
Initializes the read class and loads the data.
- class ml_grid.pipeline.read_in.read_sample(input_filename: str, test_sample_n: int, column_sample_n: int)[source]
Initializes the read_sample class and loads a data sample.
This class reads a random sample of rows and/or columns from a CSV file. It ensures that certain necessary_columns are always included if they exist in the source file.
Note
The column sampling logic (max_additional_columns) appears to be based on the number of rows to sample (test_sample_n) rather than the number of columns (column_sample_n), which may be unintended. The functionality has been preserved as is.
- Parameters:
- Raises:
ValueError – If the ‘outcome_var_1’ column does not contain at least two unique classes after sampling.