pat2vec.util.post_processing_build_methods
Functions
|
Builds a merged CSV file of bloods data from patient batch files. |
|
Builds a merged DataFrame of annotations from EPR and MCT sources. |
|
Builds a merged CSV of documents from EPR and MCT sources. |
|
Filters a DataFrame based on the given inclusion criteria. |
|
Builds and merges document and annotation dataframes, then joins them. |
|
Merge two DataFrames based on the 'document_guid' column. |
|
Merge all appointments CSV files that match the patient list. |
|
Merge all BMI CSV files that match the patient list. |
|
Merge all demographics CSV files that match the patient list. |
|
Merge all diagnostics CSV files that match the patient list. |
|
Merge all drugs CSV files that match the patient list. |
|
Merge all NEWS CSV files that match the patient list. |
|
Retrieve bloods data for the given client_idcode. |
|
Retrieves and merges document data for a patient from multiple sources. |
- pat2vec.util.post_processing_build_methods.filter_annot_dataframe(df, annot_filter_arguments)[source]
Filters a DataFrame based on the given inclusion criteria.
- Parameters:
df (
DataFrame
) – DataFrame to be filtered.annot_filter_arguments (
Dict
[str
,Any
]) – Dictionary containing inclusion criteria.
- Return type:
DataFrame
- Returns:
The filtered DataFrame.
- pat2vec.util.post_processing_build_methods.build_merged_epr_mct_annot_df(all_pat_list, config_obj, overwrite=False)[source]
Builds a merged DataFrame of annotations from EPR and MCT sources.
This function iterates through a list of patient IDs, retrieves their respective annotation data from both EPR and MCT sources using the retrieve_pat_annots_mct_epr function, and then concatenates it into a single large DataFrame. The final merged DataFrame is saved to a CSV file.
- Parameters:
all_pat_list (
List
[str
]) – A list of patient client ID codes to process.config_obj (
Any
) – A configuration object containing project settings.overwrite (
bool
) – If True, any existing merged file will be overwritten. If False, the function will skip the process if the file already exists. Defaults to False.
- Returns:
- The file path to the merged annotations CSV file.
Returns None if no annotation data is found for any patient.
- Return type:
Optional[str]
- pat2vec.util.post_processing_build_methods.build_merged_bloods(all_pat_list, config_obj, overwrite=False)[source]
Builds a merged CSV file of bloods data from patient batch files.
This function iterates through a list of patient IDs, reads the corresponding bloods batch CSV for each patient, and appends the data to a single merged CSV file. It handles file existence and overwriting logic.
- Parameters:
all_pat_list (
List
[str
]) – A list of patient client ID codes to process.config_obj (
Any
) – A configuration object containing project settings.overwrite (
bool
) – If True, any existing merged file will be overwritten. If False, data will be appended to the existing file.
- Returns:
The file path to the merged bloods CSV file.
- Return type:
str
- pat2vec.util.post_processing_build_methods.build_merged_epr_mct_doc_df(all_pat_list, config_obj, overwrite=False)[source]
Builds a merged CSV of documents from EPR and MCT sources.
This function iterates through a list of patient IDs, retrieves their respective document data from both EPR and MCT sources using the retrieve_pat_docs_mct_epr function, and appends the data to a single merged CSV file.
- Parameters:
all_pat_list (
List
[str
]) – A list of patient client ID codes to process.config_obj (
Any
) – A configuration object containing project settings.overwrite (
bool
) – If True, any existing merged file will be overwritten. If False, data will be appended to the existing file. Defaults to False.
- Returns:
The file path to the merged documents CSV file.
- Return type:
str
- pat2vec.util.post_processing_build_methods.retrieve_pat_bloods(client_idcode, config_obj)[source]
Retrieve bloods data for the given client_idcode.
- Parameters:
client_idcode (
str
) – Unique identifier for the patient.config_obj (
Any
) – Configuration object containing necessary paths and parameters.
- Return type:
DataFrame
- Returns:
Bloods data for the given client_idcode, or an empty DataFrame if not found.
- pat2vec.util.post_processing_build_methods.retrieve_pat_docs_mct_epr(client_idcode, config_obj, columns_epr=None, columns_mct=None, columns_to=None, columns_report=None, merge_columns=True)[source]
Retrieves and merges document data for a patient from multiple sources.
This function reads document data for a specified patient from four potential sources: EPR documents, MCT documents, textual observations, and reports. It loads the corresponding CSV files, optionally selecting specific columns, and concatenates them into a single DataFrame. It can also merge related
columns (like timestamps and content) to create a more unified dataset.
- Parameters:
client_idcode (
str
) – The unique identifier for the patient.config_obj (
Any
) – A configuration object containing paths to document batches.columns_epr (
Optional
[List
[str
]]) – A list of columns to load from the EPR documents CSV.columns_mct (
Optional
[List
[str
]]) – A list of columns to load from the MCT documents CSV.columns_to (
Optional
[List
[str
]]) – A list of columns to load from the textual observations CSV.columns_report (
Optional
[List
[str
]]) – A list of columns to load from the reports CSV.merge_columns (
bool
) – If True, attempts to merge corresponding columns (e.g., timestamps, content) from the different sources into a unified set of columns.
- Return type:
DataFrame
- Returns:
- A DataFrame containing the concatenated and optionally
merged document data for the patient. Returns an empty DataFrame if no data is found for the patient in any of the sources.
- pat2vec.util.post_processing_build_methods.join_docs_to_annots(annots_df, docs_temp, drop_duplicates=True)[source]
Merge two DataFrames based on the ‘document_guid’ column.
- Parameters:
annots_df (
DataFrame
) – The DataFrame containing annotations.docs_temp (
DataFrame
) – The DataFrame containing documents.drop_duplicates (
bool
) – If True, drops duplicated columns from docs_temp before merging.
- Return type:
DataFrame
- Returns:
A merged DataFrame.
- pat2vec.util.post_processing_build_methods.get_annots_joined_to_docs(config_obj, pat2vec_obj)[source]
Builds and merges document and annotation dataframes, then joins them.
This function orchestrates the process of creating comprehensive, patient-level data by first building merged dataframes for both documents (from EPR and MCT sources) and their corresponding annotations. It then joins these two dataframes based on a common document identifier.
- Parameters:
config_obj (
Any
) – A configuration object containing project settings, including proj_name and paths to data batches.pat2vec_obj (
Any
) – The main pat2vec object, which contains the all_patient_list and other necessary components.
- Return type:
DataFrame
- Returns:
- A DataFrame containing the annotations joined with their
corresponding document information.
- pat2vec.util.post_processing_build_methods.merge_demographics_csv(all_pat_list, config_obj, overwrite=False)[source]
Merge all demographics CSV files that match the patient list.
- Parameters:
all_pat_list (
List
[str
]) – List of patient IDs to include.config_obj (
Any
) – Configuration object containing project settings.overwrite (
bool
) – If True, overwrite the existing output file.
- Returns:
File path to the merged output CSV.
- Return type:
str
- pat2vec.util.post_processing_build_methods.merge_bmi_csv(all_pat_list, config_obj, overwrite=False)[source]
Merge all BMI CSV files that match the patient list.
- Parameters:
all_pat_list (
List
[str
]) – List of patient IDs to include.config_obj (
Any
) – Configuration object containing project settings.overwrite (
bool
) – If True, overwrite the existing output file.
- Returns:
File path to the merged output CSV.
- Return type:
str
- pat2vec.util.post_processing_build_methods.merge_news_csv(all_pat_list, config_obj, overwrite=False)[source]
Merge all NEWS CSV files that match the patient list.
- Parameters:
all_pat_list (
List
[str
]) – List of patient IDs to include.config_obj (
Any
) – Configuration object containing project settings.overwrite (
bool
) – If True, overwrite the existing output file.
- Returns:
File path to the merged output CSV.
- Return type:
str
- pat2vec.util.post_processing_build_methods.merge_diagnostics_csv(all_pat_list, config_obj, overwrite=False)[source]
Merge all diagnostics CSV files that match the patient list.
- Parameters:
all_pat_list (
List
[str
]) – List of patient IDs to include.config_obj (
Any
) – Configuration object containing project settings.overwrite (
bool
) – If True, overwrite the existing output file.
- Returns:
File path to the merged output CSV.
- Return type:
str
- pat2vec.util.post_processing_build_methods.merge_drugs_csv(all_pat_list, config_obj, overwrite=False)[source]
Merge all drugs CSV files that match the patient list.
- Parameters:
all_pat_list (
List
[str
]) – List of patient IDs to include.config_obj (
Any
) – Configuration object containing project settings.overwrite (
bool
) – If True, overwrite the existing output file.
- Returns:
File path to the merged output CSV.
- Return type:
str
- pat2vec.util.post_processing_build_methods.merge_appointments_csv(all_pat_list, config_obj, overwrite=False)[source]
Merge all appointments CSV files that match the patient list.
- Parameters:
all_pat_list (
List
[str
]) – List of patient IDs to include.config_obj (
Any
) – Configuration object containing project settings.overwrite (
bool
) – If True, overwrite the existing output file.
- Returns:
File path to the merged output CSV.
- Return type:
str