pat2vec.util.methods_annotation

Functions

`annot_pat_batch_docs`(...[, text_column])	Annotates a batch of patient documents using a MedCAT model.
`calculate_pretty_name_count_features`(df_copy)	Calculates count-based features from the 'pretty_name' column.
`check_pat_document_annotation_complete`(...)	Checks if a patient's document annotation file already exists.
`multi_annots_to_df_mct`(...[, text_column, ...])	Converts MedCAT annotations for MCT documents to a DataFrame and saves it.
`multi_annots_to_df_reports`(...[, ...])	Converts MedCAT annotations for reports to a DataFrame and saves it.
`multi_annots_to_df_textual_obs`(...[, ...])	Converts MedCAT annotations for textual observations to a DataFrame and saves it.

pat2vec.util.methods_annotation.check_pat_document_annotation_complete(current_pat_client_id_code, config_obj=None)[source]

Checks if a patient’s document annotation file already exists.

Parameters:

current_pat_client_id_code (str) – The patient’s ID code.
config_obj (Optional[Any]) – The configuration object containing file paths.

Return type:

bool

Returns:

True if the annotation file exists, False otherwise.

pat2vec.util.methods_annotation.annot_pat_batch_docs(current_pat_client_idcode, pat_batch, cat, config_obj, t, text_column='body_analysed')[source]

Annotates a batch of patient documents using a MedCAT model.

Parameters:

current_pat_client_idcode (str) – The patient’s ID code.
pat_batch (DataFrame) – DataFrame containing the documents to be annotated.
cat (Any) – The loaded MedCAT CAT object.
config_obj (Any) – The configuration object.
t (Any) – The tqdm progress bar instance to update.
text_column (str) – The name of the column in pat_batch containing the text to annotate.

Return type:

List[Dict[str, Any]]

Returns:

A list of dictionaries, where each dictionary contains the MedCAT annotation entities for a document.

pat2vec.util.methods_annotation.multi_annots_to_df_textual_obs(current_pat_client_idcode, pat_batch, multi_annots, config_obj, t, text_column='textualObs', time_column='basicobs_entered', guid_column='basicobs_guid')[source]

Converts MedCAT annotations for textual observations to a DataFrame and saves it.

This function processes a list of annotations, converts them to a structured DataFrame, optionally joins ICD-10/OPCS-4 codes, and saves the result to a patient-specific CSV file.

Parameters:

current_pat_client_idcode (str) – The patient’s ID code.
pat_batch (DataFrame) – DataFrame of the original documents that were annotated.
multi_annots (List[Dict[str, Any]]) – The list of annotation dictionaries from MedCAT.
config_obj (Any) – The configuration object.
t (Any) – The tqdm progress bar instance to update.
text_column (str) – The name of the text column in pat_batch.
time_column (str) – The name of the timestamp column in pat_batch.
guid_column (str) – The name of the document identifier column in pat_batch.

Return type:

None

pat2vec.util.methods_annotation.multi_annots_to_df_reports(current_pat_client_idcode, pat_batch, multi_annots, config_obj, t, text_column='body_analysed', time_column='updatetime', guid_column='basicobs_guid')[source]

Converts MedCAT annotations for reports to a DataFrame and saves it.

This function processes a list of annotations from reports, converts them to a structured DataFrame, optionally joins ICD-10/OPCS-4 codes, and saves the result to a patient-specific CSV file.

Parameters:

current_pat_client_idcode (str) – The patient’s ID code.
pat_batch (DataFrame) – DataFrame of the original report documents that were annotated.
multi_annots (List[Dict[str, Any]]) – The list of annotation dictionaries from MedCAT.
config_obj (Any) – The configuration object.
t (Any) – The tqdm progress bar instance to update.
text_column (str) – The name of the text column in pat_batch.
time_column (str) – The name of the timestamp column in pat_batch.
guid_column (str) – The name of the document identifier column in pat_batch.

Return type:

None

pat2vec.util.methods_annotation.multi_annots_to_df_mct(current_pat_client_idcode, pat_batch, multi_annots, config_obj, t, text_column='observation_valuetext_analysed', time_column='observationdocument_recordeddtm', guid_column='observation_guid')[source]

Converts MedCAT annotations for MCT documents to a DataFrame and saves it.

This function processes a list of annotations from MCT documents, converts them to a structured DataFrame, optionally joins ICD-10/OPCS-4 codes, and saves the result to a patient-specific CSV file.

Parameters:

current_pat_client_idcode (str) – The patient’s ID code.
pat_batch (DataFrame) – DataFrame of the original MCT documents that were annotated.
multi_annots (List[Dict[str, Any]]) – The list of annotation dictionaries from MedCAT.
config_obj (Any) – The configuration object.
t (Any) – The tqdm progress bar instance to update.
text_column (str) – The name of the text column in pat_batch.
time_column (str) – The name of the timestamp column in pat_batch.
guid_column (str) – The name of the document identifier column in pat_batch.

Return type:

None

pat2vec.util.methods_annotation.calculate_pretty_name_count_features(df_copy, suffix='epr')[source]

Calculates count-based features from the ‘pretty_name’ column.

This function groups a DataFrame by ‘pretty_name’ and calculates the count for each name, returning the result as a single-row DataFrame (vector).

Parameters:

df_copy (DataFrame) – The input DataFrame, expected to have a ‘pretty_name’ column.
suffix (str) – A suffix to append to the feature name.

Return type:

Optional[DataFrame]

Returns:

A single-row DataFrame with counts for each pretty_name, or None if the input DataFrame is empty.