pat2vec.util.post_processing_build_methods

Functions

build_merged_bloods(all_pat_list, config_obj)

Builds a merged CSV file of bloods data from patient batch files or database.

build_merged_epr_mct_annot_df(all_pat_list, ...)

Builds a merged DataFrame of annotations from EPR and MCT sources (file or DB).

build_merged_epr_mct_doc_df(all_pat_list, ...)

Builds a merged CSV of documents from EPR and MCT sources (file or DB).

get_annots_joined_to_docs(config_obj, ...[, ...])

Builds and merges document and annotation dataframes, then joins them.

join_docs_to_annots(annots_df, docs_temp[, ...])

Merge two DataFrames based on the 'document_guid' column.

load_merged_epr_mct_annots(config_obj, ...)

Loads merged EPR and MCT annotations.

merge_appointments_csv(all_pat_list, config_obj)

Merge all appointments data (files or DB) that match the patient list.

merge_bmi_csv(all_pat_list, config_obj[, ...])

Merge all BMI data (files or DB) that match the patient list.

merge_demographics_csv(all_pat_list, config_obj)

Merge all demographics data (files or DB) that match the patient list.

merge_diagnostics_csv(all_pat_list, config_obj)

Merge all diagnostics data (files or DB) that match the patient list.

merge_drugs_csv(all_pat_list, config_obj[, ...])

Merge all drugs data (files or DB) that match the patient list.

merge_news_csv(all_pat_list, config_obj[, ...])

Merge all NEWS data (files or DB) that match the patient list.

optimize_dtypes(df)

Downcasts numeric columns to save memory.

retrieve_pat_bloods(client_idcode, config_obj)

Retrieve bloods data for the given client_idcode (from file or DB).

retrieve_pat_docs_mct_epr(client_idcode, ...)

Retrieves and merges document data for a patient from multiple sources (file or DB).

retrieve_pat_epr_docs(client_idcode, config_obj)

Retrieve EPR documents data for the given client_idcode (from file or DB).

pat2vec.util.post_processing_build_methods.optimize_dtypes(df)[source]

Downcasts numeric columns to save memory.

Return type:

DataFrame

Parameters:

df (DataFrame)

pat2vec.util.post_processing_build_methods.build_merged_epr_mct_annot_df(all_pat_list, config_obj, overwrite=False)[source]

Builds a merged DataFrame of annotations from EPR and MCT sources (file or DB).

Return type:

Optional[str]

Parameters:
  • all_pat_list (List[str])

  • config_obj (Any)

  • overwrite (bool)

pat2vec.util.post_processing_build_methods.load_merged_epr_mct_annots(config_obj, all_pat_list, nrows=None)[source]

Loads merged EPR and MCT annotations.

If nrows is specified, returns a DataFrame containing that many rows for inspection. Otherwise, returns the path to the CSV file.

DO NOT load the entire resulting file into a single DataFrame if it is large.

Return type:

Union[str, DataFrame]

Parameters:
  • config_obj (Any)

  • all_pat_list (List[str])

  • nrows (int | None)

pat2vec.util.post_processing_build_methods.build_merged_bloods(all_pat_list, config_obj, overwrite=False)[source]

Builds a merged CSV file of bloods data from patient batch files or database.

Return type:

str

Parameters:
  • all_pat_list (List[str])

  • config_obj (Any)

  • overwrite (bool)

pat2vec.util.post_processing_build_methods.build_merged_epr_mct_doc_df(all_pat_list, config_obj, overwrite=False)[source]

Builds a merged CSV of documents from EPR and MCT sources (file or DB).

Return type:

str

Parameters:
  • all_pat_list (List[str])

  • config_obj (Any)

  • overwrite (bool)

pat2vec.util.post_processing_build_methods.retrieve_pat_bloods(client_idcode, config_obj)[source]

Retrieve bloods data for the given client_idcode (from file or DB).

Parameters:
  • client_idcode (str) – Unique identifier for the patient.

  • config_obj (Any) – Configuration object containing storage backend settings.

Return type:

DataFrame

Returns:

Bloods data for the given client_idcode, or an empty DataFrame if not found.

pat2vec.util.post_processing_build_methods.retrieve_pat_epr_docs(client_idcode, config_obj)[source]

Retrieve EPR documents data for the given client_idcode (from file or DB).

Parameters:
  • client_idcode (str) – Unique identifier for the patient.

  • config_obj (Any) – Configuration object containing storage backend settings.

Return type:

DataFrame

Returns:

EPR documents data for the given client_idcode, or an empty DataFrame if not found.

pat2vec.util.post_processing_build_methods.retrieve_pat_docs_mct_epr(client_idcode, config_obj, columns_epr=None, columns_mct=None, columns_to=None, columns_report=None, merge_columns=True)[source]

Retrieves and merges document data for a patient from multiple sources (file or DB).

This function reads document data for a specified patient from four potential sources: EPR documents, MCT documents, textual observations, and reports. It loads the corresponding data, optionally selecting specific columns, and concatenates them into a single DataFrame. It can also merge related

columns (like timestamps and content) to create a more unified dataset.

Parameters:
  • client_idcode (str) – The unique identifier for the patient.

  • config_obj (Any) – A configuration object containing paths to document batches.

  • columns_epr (Optional[List[str]]) – A list of columns to load from the EPR documents CSV.

  • columns_mct (Optional[List[str]]) – A list of columns to load from the MCT documents CSV.

  • columns_to (Optional[List[str]]) – A list of columns to load from the textual observations CSV.

  • columns_report (Optional[List[str]]) – A list of columns to load from the reports CSV.

  • merge_columns (bool) – If True, attempts to merge corresponding columns (e.g., timestamps, content) from the different sources into a unified set of columns.

Return type:

DataFrame

Returns:

A DataFrame containing the concatenated and optionally

merged document data for the patient. Returns an empty DataFrame if no data is found for the patient in any of the sources.

pat2vec.util.post_processing_build_methods.join_docs_to_annots(annots_df, docs_temp, drop_duplicates=True)[source]

Merge two DataFrames based on the ‘document_guid’ column.

Parameters:
  • annots_df (DataFrame) – The DataFrame containing annotations.

  • docs_temp (DataFrame) – The DataFrame containing documents.

  • drop_duplicates (bool) – If True, drops duplicated columns from docs_temp before merging.

Return type:

DataFrame

Returns:

A merged DataFrame.

pat2vec.util.post_processing_build_methods.get_annots_joined_to_docs(config_obj, pat2vec_obj, nrows=None)[source]

Builds and merges document and annotation dataframes, then joins them.

Returns the path to the joined CSV file, or a sampled DataFrame if nrows is set. This function processes data in small patient-level batches to avoid RAM spikes.

Return type:

Union[str, DataFrame]

Parameters:
  • config_obj (Any)

  • pat2vec_obj (Any)

  • nrows (int | None)

pat2vec.util.post_processing_build_methods.merge_demographics_csv(all_pat_list, config_obj, overwrite=False)[source]

Merge all demographics data (files or DB) that match the patient list.

Return type:

str

Parameters:
  • all_pat_list (List[str])

  • config_obj (Any)

  • overwrite (bool)

pat2vec.util.post_processing_build_methods.merge_bmi_csv(all_pat_list, config_obj, overwrite=False)[source]

Merge all BMI data (files or DB) that match the patient list.

Return type:

str

Parameters:
  • all_pat_list (List[str])

  • config_obj (Any)

  • overwrite (bool)

pat2vec.util.post_processing_build_methods.merge_news_csv(all_pat_list, config_obj, overwrite=False)[source]

Merge all NEWS data (files or DB) that match the patient list.

Return type:

str

Parameters:
  • all_pat_list (List[str])

  • config_obj (Any)

  • overwrite (bool)

pat2vec.util.post_processing_build_methods.merge_diagnostics_csv(all_pat_list, config_obj, overwrite=False)[source]

Merge all diagnostics data (files or DB) that match the patient list.

Return type:

str

Parameters:
  • all_pat_list (List[str])

  • config_obj (Any)

  • overwrite (bool)

pat2vec.util.post_processing_build_methods.merge_drugs_csv(all_pat_list, config_obj, overwrite=False)[source]

Merge all drugs data (files or DB) that match the patient list.

Return type:

str

Parameters:
  • all_pat_list (List[str])

  • config_obj (Any)

  • overwrite (bool)

pat2vec.util.post_processing_build_methods.merge_appointments_csv(all_pat_list, config_obj, overwrite=False)[source]

Merge all appointments data (files or DB) that match the patient list.

Return type:

str

Parameters:
  • all_pat_list (List[str])

  • config_obj (Any)

  • overwrite (bool)