pat2vec.util.evaluation_methods

Functions

compare_ipw_annotation_rows(dataframes[, ...])

Compares and prints differing rows from multiple annotation DataFrames.

Classes

CsvProfiler()

A class to encapsulate functionality for profiling CSV files.

pat2vec.util.evaluation_methods.compare_ipw_annotation_rows(dataframes, columns_to_print=None)[source]

Compares and prints differing rows from multiple annotation DataFrames.

This function identifies rows with the same ‘client_idcode’ across a list of DataFrames. If the ‘text_sample’ for that client differs between any of the DataFrames, it prints the specified columns for each version of the row, allowing for a side-by-side comparison.

This is useful for evaluating the effect of filtering steps, for example, comparing an annotation DataFrame before and after applying a meta-annotation filter.

Parameters:
  • dataframes (List[DataFrame]) – A list of pandas DataFrames to compare. Each DataFrame should have a name attribute for clear output.

  • columns_to_print (Optional[List[str]]) – A list of column names to print when differences are found. If None, a default set of annotation-related columns is used.

Return type:

None

class pat2vec.util.evaluation_methods.CsvProfiler[source]

Bases: object

A class to encapsulate functionality for profiling CSV files.

static create_profile_reports(epr_batchs_fp, prefix=None, cols=None, icd10_opc4s=False)[source]

Generates profiling reports for CSV files in a directory.

This method iterates through all CSV files in a specified directory, generates a ydata-profiling report for each, and saves it as an HTML file in a ‘profile_reports’ subdirectory.

Parameters:
  • epr_batchs_fp (str) – Path to the directory containing the CSV files.

  • prefix (Optional[str]) – An optional prefix to add to the generated report filenames.

  • cols (Optional[List[str]]) – A specific list of columns to include in the profile. If None, a default set of columns is used.

  • icd10_opc4s (bool) – If True, filters the DataFrame to only include rows where the ‘targetId’ column is not empty before generating the report. Defaults to False.

Return type:

None