pat2vec.util.evaluation_methods
Functions
|
Compares and prints differing rows from multiple annotation DataFrames. |
Classes
A class to encapsulate functionality for profiling CSV files. |
- pat2vec.util.evaluation_methods.compare_ipw_annotation_rows(dataframes, columns_to_print=None)[source]
Compares and prints differing rows from multiple annotation DataFrames.
This function identifies rows with the same ‘client_idcode’ across a list of DataFrames. If the ‘text_sample’ for that client differs between any of the DataFrames, it prints the specified columns for each version of the row, allowing for a side-by-side comparison.
This is useful for evaluating the effect of filtering steps, for example, comparing an annotation DataFrame before and after applying a meta-annotation filter.
- Parameters:
dataframes (
List
[DataFrame
]) – A list of pandas DataFrames to compare. Each DataFrame should have a name attribute for clear output.columns_to_print (
Optional
[List
[str
]]) – A list of column names to print when differences are found. If None, a default set of annotation-related columns is used.
- Return type:
None
- class pat2vec.util.evaluation_methods.CsvProfiler[source]
Bases:
object
A class to encapsulate functionality for profiling CSV files.
- static create_profile_reports(epr_batchs_fp, prefix=None, cols=None, icd10_opc4s=False)[source]
Generates profiling reports for CSV files in a directory.
This method iterates through all CSV files in a specified directory, generates a ydata-profiling report for each, and saves it as an HTML file in a ‘profile_reports’ subdirectory.
- Parameters:
epr_batchs_fp (
str
) – Path to the directory containing the CSV files.prefix (
Optional
[str
]) – An optional prefix to add to the generated report filenames.cols (
Optional
[List
[str
]]) – A specific list of columns to include in the profile. If None, a default set of columns is used.icd10_opc4s (
bool
) – If True, filters the DataFrame to only include rows where the ‘targetId’ column is not empty before generating the report. Defaults to False.
- Return type:
None