pat2vec.util.methods_annotation

Functions

annot_pat_batch_docs(...[, text_column])

Annotates a batch of patient documents using a MedCAT model.

calculate_pretty_name_count_features(df_copy)

Calculates count-based features from the 'pretty_name' column.

check_pat_document_annotation_complete(...)

Checks if a patient's document annotation file already exists.

multi_annots_to_df_mct(...[, text_column, ...])

Converts MedCAT annotations for MCT documents to a DataFrame and saves it.

multi_annots_to_df_reports(...[, ...])

Converts MedCAT annotations for reports to a DataFrame and saves it.

multi_annots_to_df_textual_obs(...[, ...])

Converts MedCAT annotations for textual observations to a DataFrame and saves it.

pat2vec.util.methods_annotation.check_pat_document_annotation_complete(current_pat_client_id_code, config_obj=None)[source]

Checks if a patient’s document annotation file already exists.

Parameters:
  • current_pat_client_id_code (str) – The patient’s ID code.

  • config_obj (Optional[Any]) – The configuration object containing file paths.

Return type:

bool

Returns:

True if the annotation file exists, False otherwise.

pat2vec.util.methods_annotation.annot_pat_batch_docs(current_pat_client_idcode, pat_batch, cat, config_obj, t, text_column='body_analysed')[source]

Annotates a batch of patient documents using a MedCAT model.

Parameters:
  • current_pat_client_idcode (str) – The patient’s ID code.

  • pat_batch (DataFrame) – DataFrame containing the documents to be annotated.

  • cat (Any) – The loaded MedCAT CAT object.

  • config_obj (Any) – The configuration object.

  • t (Any) – The tqdm progress bar instance to update.

  • text_column (str) – The name of the column in pat_batch containing the text to annotate.

Return type:

List[Dict[str, Any]]

Returns:

A list of dictionaries, where each dictionary contains the MedCAT annotation entities for a document.

pat2vec.util.methods_annotation.multi_annots_to_df_textual_obs(current_pat_client_idcode, pat_batch, multi_annots, config_obj, t, text_column='textualObs', time_column='basicobs_entered', guid_column='basicobs_guid')[source]

Converts MedCAT annotations for textual observations to a DataFrame and saves it.

This function processes a list of annotations, converts them to a structured DataFrame, optionally joins ICD-10/OPCS-4 codes, and saves the result to a patient-specific CSV file.

Parameters:
  • current_pat_client_idcode (str) – The patient’s ID code.

  • pat_batch (DataFrame) – DataFrame of the original documents that were annotated.

  • multi_annots (List[Dict[str, Any]]) – The list of annotation dictionaries from MedCAT.

  • config_obj (Any) – The configuration object.

  • t (Any) – The tqdm progress bar instance to update.

  • text_column (str) – The name of the text column in pat_batch.

  • time_column (str) – The name of the timestamp column in pat_batch.

  • guid_column (str) – The name of the document identifier column in pat_batch.

Return type:

None

pat2vec.util.methods_annotation.multi_annots_to_df_reports(current_pat_client_idcode, pat_batch, multi_annots, config_obj, t, text_column='body_analysed', time_column='updatetime', guid_column='basicobs_guid')[source]

Converts MedCAT annotations for reports to a DataFrame and saves it.

This function processes a list of annotations from reports, converts them to a structured DataFrame, optionally joins ICD-10/OPCS-4 codes, and saves the result to a patient-specific CSV file.

Parameters:
  • current_pat_client_idcode (str) – The patient’s ID code.

  • pat_batch (DataFrame) – DataFrame of the original report documents that were annotated.

  • multi_annots (List[Dict[str, Any]]) – The list of annotation dictionaries from MedCAT.

  • config_obj (Any) – The configuration object.

  • t (Any) – The tqdm progress bar instance to update.

  • text_column (str) – The name of the text column in pat_batch.

  • time_column (str) – The name of the timestamp column in pat_batch.

  • guid_column (str) – The name of the document identifier column in pat_batch.

Return type:

None

pat2vec.util.methods_annotation.multi_annots_to_df_mct(current_pat_client_idcode, pat_batch, multi_annots, config_obj, t, text_column='observation_valuetext_analysed', time_column='observationdocument_recordeddtm', guid_column='observation_guid')[source]

Converts MedCAT annotations for MCT documents to a DataFrame and saves it.

This function processes a list of annotations from MCT documents, converts them to a structured DataFrame, optionally joins ICD-10/OPCS-4 codes, and saves the result to a patient-specific CSV file.

Parameters:
  • current_pat_client_idcode (str) – The patient’s ID code.

  • pat_batch (DataFrame) – DataFrame of the original MCT documents that were annotated.

  • multi_annots (List[Dict[str, Any]]) – The list of annotation dictionaries from MedCAT.

  • config_obj (Any) – The configuration object.

  • t (Any) – The tqdm progress bar instance to update.

  • text_column (str) – The name of the text column in pat_batch.

  • time_column (str) – The name of the timestamp column in pat_batch.

  • guid_column (str) – The name of the document identifier column in pat_batch.

Return type:

None

pat2vec.util.methods_annotation.calculate_pretty_name_count_features(df_copy, suffix='epr')[source]

Calculates count-based features from the ‘pretty_name’ column.

This function groups a DataFrame by ‘pretty_name’ and calculates the count for each name, returning the result as a single-row DataFrame (vector).

Parameters:
  • df_copy (DataFrame) – The input DataFrame, expected to have a ‘pretty_name’ column.

  • suffix (str) – A suffix to append to the feature name.

Return type:

Optional[DataFrame]

Returns:

A single-row DataFrame with counts for each pretty_name, or None if the input DataFrame is empty.