pat2vec.util.methods_annotation_json_to_dataframe

Functions

json_to_dataframe(json_data, doc, ...[, ...])

Converts a MedCAT JSON entity dictionary to a pandas DataFrame.

parse_meta_anns(meta_anns)

Parses meta-annotations from a MedCAT entity dictionary.

pat2vec.util.methods_annotation_json_to_dataframe.json_to_dataframe(json_data, doc, current_pat_client_id_code, full_doc=False, window=300, text_column='body_analysed', time_column='updatetime', guid_column='document_guid')[source]

Converts a MedCAT JSON entity dictionary to a pandas DataFrame.

This function takes the ‘entities’ dictionary from a MedCAT output for a single document and transforms it into a structured DataFrame. Each row in the resulting DataFrame represents a single annotation (entity). It also extracts a text sample around the annotation and includes document-level metadata.

Parameters:
  • json_data (Dict[str, Any]) – The ‘entities’ dictionary from MedCAT’s output.

  • doc (Series) – The pandas Series representing the original document, containing metadata like text, timestamp, and GUID.

  • current_pat_client_id_code (str) – The patient’s unique identifier.

  • full_doc (bool) – If True, includes the full document text in the first annotation row. Defaults to False.

  • window (int) – The number of characters to include on either side of the annotation for the ‘text_sample’. Defaults to 300.

  • text_column (str) – The name of the column in doc containing the text.

  • time_column (str) – The name of the column in doc containing the timestamp.

  • guid_column (str) – The name of the column in doc containing the document GUID.

Return type:

DataFrame

Returns:

A pandas DataFrame where each row is a single annotation, or an empty DataFrame if no entities are present in the input.

pat2vec.util.methods_annotation_json_to_dataframe.parse_meta_anns(meta_anns)[source]

Parses meta-annotations from a MedCAT entity dictionary.

This function extracts the value and confidence for ‘Time’, ‘Presence’, and ‘Subject/Experiencer’ meta-annotations. It includes a fallback to check for ‘Subject’ if ‘Subject/Experiencer’ is not found.

Parameters:

meta_anns (Dict[str, Any]) – The meta_anns dictionary from a MedCAT entity.

Return type:

Dict[str, Any]

Returns:

A dictionary containing the parsed meta-annotation values and confidences.