pat2vec.util.post_processing_build_methods
Functions
|
Builds a merged CSV file of bloods data from patient batch files or database. |
|
Builds a merged DataFrame of annotations from EPR and MCT sources (file or DB). |
|
Builds a merged CSV of documents from EPR and MCT sources (file or DB). |
|
Builds and merges document and annotation dataframes, then joins them. |
|
Merge two DataFrames based on the 'document_guid' column. |
|
Loads merged EPR and MCT annotations. |
|
Merge all appointments data (files or DB) that match the patient list. |
|
Merge all BMI data (files or DB) that match the patient list. |
|
Merge all demographics data (files or DB) that match the patient list. |
|
Merge all diagnostics data (files or DB) that match the patient list. |
|
Merge all drugs data (files or DB) that match the patient list. |
|
Merge all NEWS data (files or DB) that match the patient list. |
|
Downcasts numeric columns to save memory. |
|
Retrieve bloods data for the given client_idcode (from file or DB). |
|
Retrieves and merges document data for a patient from multiple sources (file or DB). |
|
Retrieve EPR documents data for the given client_idcode (from file or DB). |
- pat2vec.util.post_processing_build_methods.optimize_dtypes(df)[source]
Downcasts numeric columns to save memory.
- Return type:
DataFrame- Parameters:
df (DataFrame)
- pat2vec.util.post_processing_build_methods.build_merged_epr_mct_annot_df(all_pat_list, config_obj, overwrite=False)[source]
Builds a merged DataFrame of annotations from EPR and MCT sources (file or DB).
- Return type:
Optional[str]- Parameters:
all_pat_list (List[str])
config_obj (Any)
overwrite (bool)
- pat2vec.util.post_processing_build_methods.load_merged_epr_mct_annots(config_obj, all_pat_list, nrows=None)[source]
Loads merged EPR and MCT annotations.
If nrows is specified, returns a DataFrame containing that many rows for inspection. Otherwise, returns the path to the CSV file.
DO NOT load the entire resulting file into a single DataFrame if it is large.
- Return type:
Union[str,DataFrame]- Parameters:
config_obj (Any)
all_pat_list (List[str])
nrows (int | None)
- pat2vec.util.post_processing_build_methods.build_merged_bloods(all_pat_list, config_obj, overwrite=False)[source]
Builds a merged CSV file of bloods data from patient batch files or database.
- Return type:
str- Parameters:
all_pat_list (List[str])
config_obj (Any)
overwrite (bool)
- pat2vec.util.post_processing_build_methods.build_merged_epr_mct_doc_df(all_pat_list, config_obj, overwrite=False)[source]
Builds a merged CSV of documents from EPR and MCT sources (file or DB).
- Return type:
str- Parameters:
all_pat_list (List[str])
config_obj (Any)
overwrite (bool)
- pat2vec.util.post_processing_build_methods.retrieve_pat_bloods(client_idcode, config_obj)[source]
Retrieve bloods data for the given client_idcode (from file or DB).
- Parameters:
client_idcode (
str) – Unique identifier for the patient.config_obj (
Any) – Configuration object containing storage backend settings.
- Return type:
DataFrame- Returns:
Bloods data for the given client_idcode, or an empty DataFrame if not found.
- pat2vec.util.post_processing_build_methods.retrieve_pat_epr_docs(client_idcode, config_obj)[source]
Retrieve EPR documents data for the given client_idcode (from file or DB).
- Parameters:
client_idcode (
str) – Unique identifier for the patient.config_obj (
Any) – Configuration object containing storage backend settings.
- Return type:
DataFrame- Returns:
EPR documents data for the given client_idcode, or an empty DataFrame if not found.
- pat2vec.util.post_processing_build_methods.retrieve_pat_docs_mct_epr(client_idcode, config_obj, columns_epr=None, columns_mct=None, columns_to=None, columns_report=None, merge_columns=True)[source]
Retrieves and merges document data for a patient from multiple sources (file or DB).
This function reads document data for a specified patient from four potential sources: EPR documents, MCT documents, textual observations, and reports. It loads the corresponding data, optionally selecting specific columns, and concatenates them into a single DataFrame. It can also merge related
columns (like timestamps and content) to create a more unified dataset.
- Parameters:
client_idcode (
str) – The unique identifier for the patient.config_obj (
Any) – A configuration object containing paths to document batches.columns_epr (
Optional[List[str]]) – A list of columns to load from the EPR documents CSV.columns_mct (
Optional[List[str]]) – A list of columns to load from the MCT documents CSV.columns_to (
Optional[List[str]]) – A list of columns to load from the textual observations CSV.columns_report (
Optional[List[str]]) – A list of columns to load from the reports CSV.merge_columns (
bool) – If True, attempts to merge corresponding columns (e.g., timestamps, content) from the different sources into a unified set of columns.
- Return type:
DataFrame- Returns:
- A DataFrame containing the concatenated and optionally
merged document data for the patient. Returns an empty DataFrame if no data is found for the patient in any of the sources.
- pat2vec.util.post_processing_build_methods.join_docs_to_annots(annots_df, docs_temp, drop_duplicates=True)[source]
Merge two DataFrames based on the ‘document_guid’ column.
- Parameters:
annots_df (
DataFrame) – The DataFrame containing annotations.docs_temp (
DataFrame) – The DataFrame containing documents.drop_duplicates (
bool) – If True, drops duplicated columns from docs_temp before merging.
- Return type:
DataFrame- Returns:
A merged DataFrame.
- pat2vec.util.post_processing_build_methods.get_annots_joined_to_docs(config_obj, pat2vec_obj, nrows=None)[source]
Builds and merges document and annotation dataframes, then joins them.
Returns the path to the joined CSV file, or a sampled DataFrame if nrows is set. This function processes data in small patient-level batches to avoid RAM spikes.
- Return type:
Union[str,DataFrame]- Parameters:
config_obj (Any)
pat2vec_obj (Any)
nrows (int | None)
- pat2vec.util.post_processing_build_methods.merge_demographics_csv(all_pat_list, config_obj, overwrite=False)[source]
Merge all demographics data (files or DB) that match the patient list.
- Return type:
str- Parameters:
all_pat_list (List[str])
config_obj (Any)
overwrite (bool)
- pat2vec.util.post_processing_build_methods.merge_bmi_csv(all_pat_list, config_obj, overwrite=False)[source]
Merge all BMI data (files or DB) that match the patient list.
- Return type:
str- Parameters:
all_pat_list (List[str])
config_obj (Any)
overwrite (bool)
- pat2vec.util.post_processing_build_methods.merge_news_csv(all_pat_list, config_obj, overwrite=False)[source]
Merge all NEWS data (files or DB) that match the patient list.
- Return type:
str- Parameters:
all_pat_list (List[str])
config_obj (Any)
overwrite (bool)
- pat2vec.util.post_processing_build_methods.merge_diagnostics_csv(all_pat_list, config_obj, overwrite=False)[source]
Merge all diagnostics data (files or DB) that match the patient list.
- Return type:
str- Parameters:
all_pat_list (List[str])
config_obj (Any)
overwrite (bool)