pat2vec.util.post_processing_build_methods

Functions

build_merged_bloods(all_pat_list, config_obj)

Builds a merged CSV file of bloods data from patient batch files.

build_merged_epr_mct_annot_df(all_pat_list, ...)

Builds a merged DataFrame of annotations from EPR and MCT sources.

build_merged_epr_mct_doc_df(all_pat_list, ...)

Builds a merged CSV of documents from EPR and MCT sources.

filter_annot_dataframe(df, ...)

Filters a DataFrame based on the given inclusion criteria.

get_annots_joined_to_docs(config_obj, ...)

Builds and merges document and annotation dataframes, then joins them.

join_docs_to_annots(annots_df, docs_temp[, ...])

Merge two DataFrames based on the 'document_guid' column.

merge_appointments_csv(all_pat_list, config_obj)

Merge all appointments CSV files that match the patient list.

merge_bmi_csv(all_pat_list, config_obj[, ...])

Merge all BMI CSV files that match the patient list.

merge_demographics_csv(all_pat_list, config_obj)

Merge all demographics CSV files that match the patient list.

merge_diagnostics_csv(all_pat_list, config_obj)

Merge all diagnostics CSV files that match the patient list.

merge_drugs_csv(all_pat_list, config_obj[, ...])

Merge all drugs CSV files that match the patient list.

merge_news_csv(all_pat_list, config_obj[, ...])

Merge all NEWS CSV files that match the patient list.

retrieve_pat_bloods(client_idcode, config_obj)

Retrieve bloods data for the given client_idcode.

retrieve_pat_docs_mct_epr(client_idcode, ...)

Retrieves and merges document data for a patient from multiple sources.

pat2vec.util.post_processing_build_methods.filter_annot_dataframe(df, annot_filter_arguments)[source]

Filters a DataFrame based on the given inclusion criteria.

Parameters:
  • df (DataFrame) – DataFrame to be filtered.

  • annot_filter_arguments (Dict[str, Any]) – Dictionary containing inclusion criteria.

Return type:

DataFrame

Returns:

The filtered DataFrame.

pat2vec.util.post_processing_build_methods.build_merged_epr_mct_annot_df(all_pat_list, config_obj, overwrite=False)[source]

Builds a merged DataFrame of annotations from EPR and MCT sources.

This function iterates through a list of patient IDs, retrieves their respective annotation data from both EPR and MCT sources using the retrieve_pat_annots_mct_epr function, and then concatenates it into a single large DataFrame. The final merged DataFrame is saved to a CSV file.

Parameters:
  • all_pat_list (List[str]) – A list of patient client ID codes to process.

  • config_obj (Any) – A configuration object containing project settings.

  • overwrite (bool) – If True, any existing merged file will be overwritten. If False, the function will skip the process if the file already exists. Defaults to False.

Returns:

The file path to the merged annotations CSV file.

Returns None if no annotation data is found for any patient.

Return type:

Optional[str]

pat2vec.util.post_processing_build_methods.build_merged_bloods(all_pat_list, config_obj, overwrite=False)[source]

Builds a merged CSV file of bloods data from patient batch files.

This function iterates through a list of patient IDs, reads the corresponding bloods batch CSV for each patient, and appends the data to a single merged CSV file. It handles file existence and overwriting logic.

Parameters:
  • all_pat_list (List[str]) – A list of patient client ID codes to process.

  • config_obj (Any) – A configuration object containing project settings.

  • overwrite (bool) – If True, any existing merged file will be overwritten. If False, data will be appended to the existing file.

Returns:

The file path to the merged bloods CSV file.

Return type:

str

pat2vec.util.post_processing_build_methods.build_merged_epr_mct_doc_df(all_pat_list, config_obj, overwrite=False)[source]

Builds a merged CSV of documents from EPR and MCT sources.

This function iterates through a list of patient IDs, retrieves their respective document data from both EPR and MCT sources using the retrieve_pat_docs_mct_epr function, and appends the data to a single merged CSV file.

Parameters:
  • all_pat_list (List[str]) – A list of patient client ID codes to process.

  • config_obj (Any) – A configuration object containing project settings.

  • overwrite (bool) – If True, any existing merged file will be overwritten. If False, data will be appended to the existing file. Defaults to False.

Returns:

The file path to the merged documents CSV file.

Return type:

str

pat2vec.util.post_processing_build_methods.retrieve_pat_bloods(client_idcode, config_obj)[source]

Retrieve bloods data for the given client_idcode.

Parameters:
  • client_idcode (str) – Unique identifier for the patient.

  • config_obj (Any) – Configuration object containing necessary paths and parameters.

Return type:

DataFrame

Returns:

Bloods data for the given client_idcode, or an empty DataFrame if not found.

pat2vec.util.post_processing_build_methods.retrieve_pat_docs_mct_epr(client_idcode, config_obj, columns_epr=None, columns_mct=None, columns_to=None, columns_report=None, merge_columns=True)[source]

Retrieves and merges document data for a patient from multiple sources.

This function reads document data for a specified patient from four potential sources: EPR documents, MCT documents, textual observations, and reports. It loads the corresponding CSV files, optionally selecting specific columns, and concatenates them into a single DataFrame. It can also merge related

columns (like timestamps and content) to create a more unified dataset.

Parameters:
  • client_idcode (str) – The unique identifier for the patient.

  • config_obj (Any) – A configuration object containing paths to document batches.

  • columns_epr (Optional[List[str]]) – A list of columns to load from the EPR documents CSV.

  • columns_mct (Optional[List[str]]) – A list of columns to load from the MCT documents CSV.

  • columns_to (Optional[List[str]]) – A list of columns to load from the textual observations CSV.

  • columns_report (Optional[List[str]]) – A list of columns to load from the reports CSV.

  • merge_columns (bool) – If True, attempts to merge corresponding columns (e.g., timestamps, content) from the different sources into a unified set of columns.

Return type:

DataFrame

Returns:

A DataFrame containing the concatenated and optionally

merged document data for the patient. Returns an empty DataFrame if no data is found for the patient in any of the sources.

pat2vec.util.post_processing_build_methods.join_docs_to_annots(annots_df, docs_temp, drop_duplicates=True)[source]

Merge two DataFrames based on the ‘document_guid’ column.

Parameters:
  • annots_df (DataFrame) – The DataFrame containing annotations.

  • docs_temp (DataFrame) – The DataFrame containing documents.

  • drop_duplicates (bool) – If True, drops duplicated columns from docs_temp before merging.

Return type:

DataFrame

Returns:

A merged DataFrame.

pat2vec.util.post_processing_build_methods.get_annots_joined_to_docs(config_obj, pat2vec_obj)[source]

Builds and merges document and annotation dataframes, then joins them.

This function orchestrates the process of creating comprehensive, patient-level data by first building merged dataframes for both documents (from EPR and MCT sources) and their corresponding annotations. It then joins these two dataframes based on a common document identifier.

Parameters:
  • config_obj (Any) – A configuration object containing project settings, including proj_name and paths to data batches.

  • pat2vec_obj (Any) – The main pat2vec object, which contains the all_patient_list and other necessary components.

Return type:

DataFrame

Returns:

A DataFrame containing the annotations joined with their

corresponding document information.

pat2vec.util.post_processing_build_methods.merge_demographics_csv(all_pat_list, config_obj, overwrite=False)[source]

Merge all demographics CSV files that match the patient list.

Parameters:
  • all_pat_list (List[str]) – List of patient IDs to include.

  • config_obj (Any) – Configuration object containing project settings.

  • overwrite (bool) – If True, overwrite the existing output file.

Returns:

File path to the merged output CSV.

Return type:

str

pat2vec.util.post_processing_build_methods.merge_bmi_csv(all_pat_list, config_obj, overwrite=False)[source]

Merge all BMI CSV files that match the patient list.

Parameters:
  • all_pat_list (List[str]) – List of patient IDs to include.

  • config_obj (Any) – Configuration object containing project settings.

  • overwrite (bool) – If True, overwrite the existing output file.

Returns:

File path to the merged output CSV.

Return type:

str

pat2vec.util.post_processing_build_methods.merge_news_csv(all_pat_list, config_obj, overwrite=False)[source]

Merge all NEWS CSV files that match the patient list.

Parameters:
  • all_pat_list (List[str]) – List of patient IDs to include.

  • config_obj (Any) – Configuration object containing project settings.

  • overwrite (bool) – If True, overwrite the existing output file.

Returns:

File path to the merged output CSV.

Return type:

str

pat2vec.util.post_processing_build_methods.merge_diagnostics_csv(all_pat_list, config_obj, overwrite=False)[source]

Merge all diagnostics CSV files that match the patient list.

Parameters:
  • all_pat_list (List[str]) – List of patient IDs to include.

  • config_obj (Any) – Configuration object containing project settings.

  • overwrite (bool) – If True, overwrite the existing output file.

Returns:

File path to the merged output CSV.

Return type:

str

pat2vec.util.post_processing_build_methods.merge_drugs_csv(all_pat_list, config_obj, overwrite=False)[source]

Merge all drugs CSV files that match the patient list.

Parameters:
  • all_pat_list (List[str]) – List of patient IDs to include.

  • config_obj (Any) – Configuration object containing project settings.

  • overwrite (bool) – If True, overwrite the existing output file.

Returns:

File path to the merged output CSV.

Return type:

str

pat2vec.util.post_processing_build_methods.merge_appointments_csv(all_pat_list, config_obj, overwrite=False)[source]

Merge all appointments CSV files that match the patient list.

Parameters:
  • all_pat_list (List[str]) – List of patient IDs to include.

  • config_obj (Any) – Configuration object containing project settings.

  • overwrite (bool) – If True, overwrite the existing output file.

Returns:

File path to the merged output CSV.

Return type:

str