pat2vec.util.post_processing_build_methods

Functions

`build_merged_bloods`(all_pat_list, config_obj)	Builds a merged CSV file of bloods data from patient batch files.
`build_merged_epr_mct_annot_df`(all_pat_list, ...)	Builds a merged DataFrame of annotations from EPR and MCT sources.
`build_merged_epr_mct_doc_df`(all_pat_list, ...)	Builds a merged CSV of documents from EPR and MCT sources.
`filter_annot_dataframe`(df, ...)	Filters a DataFrame based on the given inclusion criteria.
`get_annots_joined_to_docs`(config_obj, ...)	Builds and merges document and annotation dataframes, then joins them.
`join_docs_to_annots`(annots_df, docs_temp[, ...])	Merge two DataFrames based on the 'document_guid' column.
`merge_appointments_csv`(all_pat_list, config_obj)	Merge all appointments CSV files that match the patient list.
`merge_bmi_csv`(all_pat_list, config_obj[, ...])	Merge all BMI CSV files that match the patient list.
`merge_demographics_csv`(all_pat_list, config_obj)	Merge all demographics CSV files that match the patient list.
`merge_diagnostics_csv`(all_pat_list, config_obj)	Merge all diagnostics CSV files that match the patient list.
`merge_drugs_csv`(all_pat_list, config_obj[, ...])	Merge all drugs CSV files that match the patient list.
`merge_news_csv`(all_pat_list, config_obj[, ...])	Merge all NEWS CSV files that match the patient list.
`retrieve_pat_bloods`(client_idcode, config_obj)	Retrieve bloods data for the given client_idcode.
`retrieve_pat_docs_mct_epr`(client_idcode, ...)	Retrieves and merges document data for a patient from multiple sources.

pat2vec.util.post_processing_build_methods.filter_annot_dataframe(df, annot_filter_arguments)[source]

Filters a DataFrame based on the given inclusion criteria.

Parameters:

df (DataFrame) – DataFrame to be filtered.
annot_filter_arguments (Dict[str, Any]) – Dictionary containing inclusion criteria.

Return type:

DataFrame

Returns:

The filtered DataFrame.

pat2vec.util.post_processing_build_methods.build_merged_epr_mct_annot_df(all_pat_list, config_obj, overwrite=False)[source]

Builds a merged DataFrame of annotations from EPR and MCT sources.

This function iterates through a list of patient IDs, retrieves their respective annotation data from both EPR and MCT sources using the retrieve_pat_annots_mct_epr function, and then concatenates it into a single large DataFrame. The final merged DataFrame is saved to a CSV file.

Parameters:

all_pat_list (List[str]) – A list of patient client ID codes to process.
config_obj (Any) – A configuration object containing project settings.
overwrite (bool) – If True, any existing merged file will be overwritten. If False, the function will skip the process if the file already exists. Defaults to False.

Returns:

The file path to the merged annotations CSV file.: Returns None if no annotation data is found for any patient.

Return type:

Optional[str]

pat2vec.util.post_processing_build_methods.build_merged_bloods(all_pat_list, config_obj, overwrite=False)[source]

Builds a merged CSV file of bloods data from patient batch files.

This function iterates through a list of patient IDs, reads the corresponding bloods batch CSV for each patient, and appends the data to a single merged CSV file. It handles file existence and overwriting logic.

Parameters:

all_pat_list (List[str]) – A list of patient client ID codes to process.
config_obj (Any) – A configuration object containing project settings.
overwrite (bool) – If True, any existing merged file will be overwritten. If False, data will be appended to the existing file.

Returns:

The file path to the merged bloods CSV file.

Return type:

str

pat2vec.util.post_processing_build_methods.build_merged_epr_mct_doc_df(all_pat_list, config_obj, overwrite=False)[source]

Builds a merged CSV of documents from EPR and MCT sources.

This function iterates through a list of patient IDs, retrieves their respective document data from both EPR and MCT sources using the retrieve_pat_docs_mct_epr function, and appends the data to a single merged CSV file.

Parameters:

all_pat_list (List[str]) – A list of patient client ID codes to process.
config_obj (Any) – A configuration object containing project settings.
overwrite (bool) – If True, any existing merged file will be overwritten. If False, data will be appended to the existing file. Defaults to False.

Returns:

The file path to the merged documents CSV file.

Return type:

str

pat2vec.util.post_processing_build_methods.retrieve_pat_bloods(client_idcode, config_obj)[source]

Retrieve bloods data for the given client_idcode.

Parameters:

client_idcode (str) – Unique identifier for the patient.
config_obj (Any) – Configuration object containing necessary paths and parameters.

Return type:

DataFrame

Returns:

Bloods data for the given client_idcode, or an empty DataFrame if not found.

pat2vec.util.post_processing_build_methods.retrieve_pat_docs_mct_epr(client_idcode, config_obj, columns_epr=None, columns_mct=None, columns_to=None, columns_report=None, merge_columns=True)[source]

Retrieves and merges document data for a patient from multiple sources.

This function reads document data for a specified patient from four potential sources: EPR documents, MCT documents, textual observations, and reports. It loads the corresponding CSV files, optionally selecting specific columns, and concatenates them into a single DataFrame. It can also merge related

columns (like timestamps and content) to create a more unified dataset.

Parameters:

client_idcode (str) – The unique identifier for the patient.
config_obj (Any) – A configuration object containing paths to document batches.
columns_epr (Optional[List[str]]) – A list of columns to load from the EPR documents CSV.
columns_mct (Optional[List[str]]) – A list of columns to load from the MCT documents CSV.
columns_to (Optional[List[str]]) – A list of columns to load from the textual observations CSV.
columns_report (Optional[List[str]]) – A list of columns to load from the reports CSV.
merge_columns (bool) – If True, attempts to merge corresponding columns (e.g., timestamps, content) from the different sources into a unified set of columns.

Return type:

DataFrame

Returns:

A DataFrame containing the concatenated and optionally: merged document data for the patient. Returns an empty DataFrame if no data is found for the patient in any of the sources.

pat2vec.util.post_processing_build_methods.join_docs_to_annots(annots_df, docs_temp, drop_duplicates=True)[source]

Merge two DataFrames based on the ‘document_guid’ column.

Parameters:

annots_df (DataFrame) – The DataFrame containing annotations.
docs_temp (DataFrame) – The DataFrame containing documents.
drop_duplicates (bool) – If True, drops duplicated columns from docs_temp before merging.

Return type:

DataFrame

Returns:

A merged DataFrame.

pat2vec.util.post_processing_build_methods.get_annots_joined_to_docs(config_obj, pat2vec_obj)[source]

Builds and merges document and annotation dataframes, then joins them.

This function orchestrates the process of creating comprehensive, patient-level data by first building merged dataframes for both documents (from EPR and MCT sources) and their corresponding annotations. It then joins these two dataframes based on a common document identifier.

Parameters:

config_obj (Any) – A configuration object containing project settings, including proj_name and paths to data batches.
pat2vec_obj (Any) – The main pat2vec object, which contains the all_patient_list and other necessary components.

Return type:

DataFrame

Returns:

A DataFrame containing the annotations joined with their: corresponding document information.

pat2vec.util.post_processing_build_methods.merge_demographics_csv(all_pat_list, config_obj, overwrite=False)[source]

Merge all demographics CSV files that match the patient list.

Parameters:

all_pat_list (List[str]) – List of patient IDs to include.
config_obj (Any) – Configuration object containing project settings.
overwrite (bool) – If True, overwrite the existing output file.

Returns:

File path to the merged output CSV.

Return type:

str

pat2vec.util.post_processing_build_methods.merge_bmi_csv(all_pat_list, config_obj, overwrite=False)[source]

Merge all BMI CSV files that match the patient list.

Parameters:

all_pat_list (List[str]) – List of patient IDs to include.
config_obj (Any) – Configuration object containing project settings.
overwrite (bool) – If True, overwrite the existing output file.

Returns:

File path to the merged output CSV.

Return type:

str

pat2vec.util.post_processing_build_methods.merge_news_csv(all_pat_list, config_obj, overwrite=False)[source]

Merge all NEWS CSV files that match the patient list.

Parameters:

all_pat_list (List[str]) – List of patient IDs to include.
config_obj (Any) – Configuration object containing project settings.
overwrite (bool) – If True, overwrite the existing output file.

Returns:

File path to the merged output CSV.

Return type:

str

pat2vec.util.post_processing_build_methods.merge_diagnostics_csv(all_pat_list, config_obj, overwrite=False)[source]

Merge all diagnostics CSV files that match the patient list.

Parameters:

all_pat_list (List[str]) – List of patient IDs to include.
config_obj (Any) – Configuration object containing project settings.
overwrite (bool) – If True, overwrite the existing output file.

Returns:

File path to the merged output CSV.

Return type:

str

pat2vec.util.post_processing_build_methods.merge_drugs_csv(all_pat_list, config_obj, overwrite=False)[source]

Merge all drugs CSV files that match the patient list.

Parameters:

all_pat_list (List[str]) – List of patient IDs to include.
config_obj (Any) – Configuration object containing project settings.
overwrite (bool) – If True, overwrite the existing output file.

Returns:

File path to the merged output CSV.

Return type:

str

pat2vec.util.post_processing_build_methods.merge_appointments_csv(all_pat_list, config_obj, overwrite=False)[source]

Merge all appointments CSV files that match the patient list.

Parameters:

all_pat_list (List[str]) – List of patient IDs to include.
config_obj (Any) – Configuration object containing project settings.
overwrite (bool) – If True, overwrite the existing output file.

Returns:

File path to the merged output CSV.

Return type:

str