ml_grid.pipeline.read_in
========================

.. py:module:: ml_grid.pipeline.read_in


Classes
-------

.. autoapisummary::

   ml_grid.pipeline.read_in.read
   ml_grid.pipeline.read_in.read_sample


Module Contents
---------------

.. py:class:: read(input_filename: str, use_polars: bool = False)

   Initializes the read class and loads the data.

   :param input_filename: The path to the input CSV file.
   :type input_filename: str
   :param use_polars: If True, attempts to read the CSV using
                      the Polars library and converts it to a pandas DataFrame.
                      Falls back to pandas if Polars fails. Defaults to False.
   :type use_polars: bool, optional


.. py:class:: read_sample(input_filename: str, test_sample_n: int, column_sample_n: int)

   Initializes the read_sample class and loads a data sample.

   This class reads a random sample of rows and/or columns from a CSV file.
   It ensures that certain `necessary_columns` are always included if they
   exist in the source file.

   .. note::

      The column sampling logic (`max_additional_columns`) appears to be
      based on the number of rows to sample (`test_sample_n`) rather than
      the number of columns (`column_sample_n`), which may be unintended.
      The functionality has been preserved as is.

   :param input_filename: The path to the input CSV file.
   :type input_filename: str
   :param test_sample_n: The number of rows to randomly sample. If 0,
                         all rows are read.
   :type test_sample_n: int
   :param column_sample_n: The number of columns to randomly sample,
                           in addition to the `necessary_columns`.
   :type column_sample_n: int

   :raises ValueError: If the 'outcome_var_1' column does not contain at least
       two unique classes after sampling.


   .. py:attribute:: filename