pat2vec.util.methods_annotation_multi_annots_to_df
Functions
|
Processes MedCAT annotations for a batch of documents, creating and saving a DataFrame. |
|
Context manager for creating and cleaning up temporary files. |
- pat2vec.util.methods_annotation_multi_annots_to_df.temporary_file(suffix='.csv', delete=True)[source]
Context manager for creating and cleaning up temporary files.
- Parameters:
suffix (
str
) – The file suffix for the temporary file.delete (
bool
) – If True, the file is deleted upon exiting the context.
- Yields:
The path to the temporary file.
- Return type:
Iterator
[str
]
- pat2vec.util.methods_annotation_multi_annots_to_df.multi_annots_to_df(current_pat_client_idcode, pat_batch, multi_annots, config_obj, t, text_column='body_analysed', time_column='updatetime', guid_column='document_guid')[source]
Processes MedCAT annotations for a batch of documents, creating and saving a DataFrame.
This function takes a list of MedCAT annotation results, corresponding to a batch of documents for a single patient. It iterates through each document’s annotations, converts them from JSON-like dictionary format into a structured pandas DataFrame using json_to_dataframe, and concatenates them into a single master DataFrame for the patient.
The function can optionally enrich the annotation data by joining it with ICD-10 and OPCS-4 codes based on settings in the configuration object.
Finally, the resulting DataFrame is saved as a CSV file in the patient’s designated annotation directory.
- Parameters:
current_pat_client_idcode (
str
) – The unique identifier for the patient.pat_batch (
DataFrame
) – A DataFrame where each row represents a document in the patient’s batch.multi_annots (
List
[Dict
[str
,Any
]]) – A list of dictionaries, where each dictionary contains the MedCAT annotation entities for a corresponding document in pat_batch.config_obj (
Any
) – A configuration object containing settings such as file paths (pre_document_annotation_batch_path), verbosity level, and flags for add_icd10 and add_opc4s. Defaults to None.t (
Any
) – A tqdm progress bar object for providing real-time feedback.text_column (
str
) – The name of the column in pat_batch that contains the document text to be annotated. Defaults to ‘body_analysed’.time_column (
str
) – The name of the column in pat_batch that holds the timestamp for each document. Defaults to ‘updatetime’.guid_column (
str
) – The name of the column in pat_batch that contains the unique identifier for each document. Defaults to ‘document_guid’.
- Return type:
DataFrame
- Returns:
A consolidated DataFrame containing all annotations for the patient’s document batch. An empty DataFrame is returned if no valid annotations are processed.
- Raises:
ValueError – If config_obj is not provided.