ml_grid.pipeline.read_in ======================== .. py:module:: ml_grid.pipeline.read_in Classes ------- .. autoapisummary:: ml_grid.pipeline.read_in.read ml_grid.pipeline.read_in.read_sample Module Contents --------------- .. py:class:: read(input_filename: str, use_polars: bool = False) Initializes the read class and loads the data. :param input_filename: The path to the input CSV file. :type input_filename: str :param use_polars: If True, attempts to read the CSV using the Polars library and converts it to a pandas DataFrame. Falls back to pandas if Polars fails. Defaults to False. :type use_polars: bool, optional .. py:class:: read_sample(input_filename: str, test_sample_n: int, column_sample_n: int) Initializes the read_sample class and loads a data sample. This class reads a random sample of rows and/or columns from a CSV file. It ensures that certain `necessary_columns` are always included if they exist in the source file. .. note:: The column sampling logic (`max_additional_columns`) appears to be based on the number of rows to sample (`test_sample_n`) rather than the number of columns (`column_sample_n`), which may be unintended. The functionality has been preserved as is. :param input_filename: The path to the input CSV file. :type input_filename: str :param test_sample_n: The number of rows to randomly sample. If 0, all rows are read. :type test_sample_n: int :param column_sample_n: The number of columns to randomly sample, in addition to the `necessary_columns`. :type column_sample_n: int :raises ValueError: If the 'outcome_var_1' column does not contain at least two unique classes after sampling. .. py:attribute:: filename