core ==== .. py:module:: core .. autoapi-nested-parse:: Core module for ML results aggregation and management. Handles loading, aggregating, and basic processing of results data. Classes ------- .. autoapisummary:: core.ResultsAggregator core.DataValidator Functions --------- .. autoapisummary:: core.get_clean_data core.stratify_by_outcome Module Contents --------------- .. py:class:: ResultsAggregator(root_folder: str, feature_names_csv: Optional[str] = None) Initializes the ResultsAggregator. :param root_folder: The path to the master root folder containing experiment run subfolders. :type root_folder: str :param feature_names_csv: The path to a CSV file whose headers are the original feature names. This is required for decoding feature lists. Defaults to None. :type feature_names_csv: Optional[str], optional .. py:attribute:: root_folder .. py:attribute:: feature_names :type: Optional[List[str]] :value: None .. py:attribute:: aggregated_data :type: Optional[pandas.DataFrame] :value: None .. py:method:: load_feature_names(feature_names_csv: str) -> None Loads feature names from the column headers of a CSV file. :param feature_names_csv: The path to the CSV file. :type feature_names_csv: str .. py:method:: get_available_runs() -> List[str] Gets a list of available timestamped run folders. :returns: A sorted list of valid run folder names. :rtype: List[str] :raises ValueError: If the root folder does not exist. .. py:method:: load_single_run(timestamp_folder: str) -> pandas.DataFrame Loads results from a specific timestamped run folder. :param timestamp_folder: The name of the run folder. :type timestamp_folder: str :returns: A DataFrame containing the results for that run. :rtype: pd.DataFrame :raises FileNotFoundError: If the log file does not exist in the folder. .. py:method:: aggregate_all_runs() -> pandas.DataFrame Aggregates results from all available runs in the root folder. :returns: A single DataFrame containing all aggregated results. :rtype: pd.DataFrame :raises ValueError: If no valid runs are found. .. py:method:: aggregate_specific_runs(run_names: List[str]) -> pandas.DataFrame Aggregates results from a specified list of run folders. :param run_names: A list of run folder names to aggregate. :type run_names: List[str] :returns: A single DataFrame containing the aggregated results. :rtype: pd.DataFrame :raises ValueError: If no data could be loaded from the specified runs. .. py:method:: get_summary_stats(data: Optional[pandas.DataFrame] = None) -> pandas.DataFrame Gets summary statistics for the aggregated results. :param data: The DataFrame to summarize. If None, uses the internally stored aggregated data. Defaults to None. :type data: Optional[pd.DataFrame], optional :returns: A DataFrame containing descriptive statistics. :rtype: pd.DataFrame :raises ValueError: If no data is available. .. py:method:: get_outcome_variables(data: Optional[pandas.DataFrame] = None) -> List[str] Gets a list of unique outcome variables from the data. :param data: The DataFrame to inspect. If None, uses the internally stored aggregated data. Defaults to None. :type data: Optional[pd.DataFrame], optional :returns: A sorted list of unique outcome variable names. :rtype: List[str] :raises ValueError: If no data is available or the 'outcome_variable' column is missing. .. py:method:: get_data_by_outcome(outcome_variable: str, data: Optional[pandas.DataFrame] = None) -> pandas.DataFrame Filters the data for a specific outcome variable. :param outcome_variable: The outcome variable to filter by. :type outcome_variable: str :param data: The DataFrame to filter. If None, uses the internally stored aggregated data. Defaults to None. :type data: Optional[pd.DataFrame], optional :returns: A new DataFrame containing only the data for the specified outcome. :rtype: pd.DataFrame :raises ValueError: If no data is available, the 'outcome_variable' column is missing, or no data is found for the specified outcome. .. py:method:: get_outcome_summary(data: Optional[pandas.DataFrame] = None) -> pandas.DataFrame Gets summary statistics stratified by outcome variable. :param data: The DataFrame to summarize. If None, uses the internally stored aggregated data. Defaults to None. :type data: Optional[pd.DataFrame], optional :returns: A multi-index DataFrame with summary statistics for each outcome variable. :rtype: pd.DataFrame :raises ValueError: If no data is available or the 'outcome_variable' column is missing. .. py:class:: DataValidator A utility class for validating and checking the quality of results data. .. py:method:: validate_data_structure(df: pandas.DataFrame) -> Dict[str, Any] :staticmethod: Validates the structure and quality of a results DataFrame. :param df: The DataFrame to validate. :type df: pd.DataFrame :returns: A dictionary containing the validation report. :rtype: Dict[str, Any] .. py:method:: print_validation_report(validation_report: Dict[str, Any]) -> None :staticmethod: Prints a formatted validation report to the console. :param validation_report: The validation report dictionary generated by `validate_data_structure`. :type validation_report: Dict[str, Any] .. py:function:: get_clean_data(df: pandas.DataFrame, remove_failed: bool = True) -> pandas.DataFrame A utility function to get clean data for analysis. :param df: The input DataFrame. :type df: pd.DataFrame :param remove_failed: If True, removes rows where the 'failed' column is 1. Defaults to True. :type remove_failed: bool, optional :returns: The cleaned DataFrame. :rtype: pd.DataFrame .. py:function:: stratify_by_outcome(df: pandas.DataFrame, func: callable, *args: Any, **kwargs: Any) -> Dict[str, Any] Applies a function to data stratified by outcome variable. :param df: DataFrame with an 'outcome_variable' column. :type df: pd.DataFrame :param func: The function to apply to each outcome's data subset. :type func: callable :param \*args: Positional arguments to pass to the function. :type \*args: Any :param \*\*kwargs: Keyword arguments to pass to the function. :type \*\*kwargs: Any :returns: A dictionary with outcome variables as keys and the results of the function as values. :rtype: Dict[str, Any] :raises ValueError: If the 'outcome_variable' column is not found.