pat2vec.util.methods_annotation_multi_annots_to_df

Functions

multi_annots_to_df(...[, text_column, ...])

Processes MedCAT annotations for a batch of documents, creating and saving a DataFrame.

temporary_file([suffix, delete])

Context manager for creating and cleaning up temporary files.

pat2vec.util.methods_annotation_multi_annots_to_df.temporary_file(suffix='.csv', delete=True)[source]

Context manager for creating and cleaning up temporary files.

Parameters:
  • suffix (str) – The file suffix for the temporary file.

  • delete (bool) – If True, the file is deleted upon exiting the context.

Yields:

The path to the temporary file.

Return type:

Iterator[str]

pat2vec.util.methods_annotation_multi_annots_to_df.multi_annots_to_df(current_pat_client_idcode, pat_batch, multi_annots, config_obj, t, text_column='body_analysed', time_column='updatetime', guid_column='document_guid')[source]

Processes MedCAT annotations for a batch of documents, creating and saving a DataFrame.

This function takes a list of MedCAT annotation results, corresponding to a batch of documents for a single patient. It iterates through each document’s annotations, converts them from JSON-like dictionary format into a structured pandas DataFrame using json_to_dataframe, and concatenates them into a single master DataFrame for the patient.

The function can optionally enrich the annotation data by joining it with ICD-10 and OPCS-4 codes based on settings in the configuration object.

Finally, the resulting DataFrame is saved as a CSV file in the patient’s designated annotation directory.

Parameters:
  • current_pat_client_idcode (str) – The unique identifier for the patient.

  • pat_batch (DataFrame) – A DataFrame where each row represents a document in the patient’s batch.

  • multi_annots (List[Dict[str, Any]]) – A list of dictionaries, where each dictionary contains the MedCAT annotation entities for a corresponding document in pat_batch.

  • config_obj (Any) – A configuration object containing settings such as file paths (pre_document_annotation_batch_path), verbosity level, and flags for add_icd10 and add_opc4s. Defaults to None.

  • t (Any) – A tqdm progress bar object for providing real-time feedback.

  • text_column (str) – The name of the column in pat_batch that contains the document text to be annotated. Defaults to ‘body_analysed’.

  • time_column (str) – The name of the column in pat_batch that holds the timestamp for each document. Defaults to ‘updatetime’.

  • guid_column (str) – The name of the column in pat_batch that contains the unique identifier for each document. Defaults to ‘document_guid’.

Return type:

DataFrame

Returns:

A consolidated DataFrame containing all annotations for the patient’s document batch. An empty DataFrame is returned if no valid annotations are processed.

Raises:

ValueError – If config_obj is not provided.