pat2vec.util.methods_annotation
Functions
|
Annotates a batch of patient documents using a MedCAT model. |
|
Calculates count-based features from the 'pretty_name' column. |
Checks if a patient's document annotation file already exists. |
|
|
Converts MedCAT annotations for MCT documents to a DataFrame and saves it. |
|
Converts MedCAT annotations for reports to a DataFrame and saves it. |
|
Converts MedCAT annotations for textual observations to a DataFrame and saves it. |
- pat2vec.util.methods_annotation.check_pat_document_annotation_complete(current_pat_client_id_code, config_obj=None)[source]
Checks if a patient’s document annotation file already exists.
- Parameters:
current_pat_client_id_code (
str
) – The patient’s ID code.config_obj (
Optional
[Any
]) – The configuration object containing file paths.
- Return type:
bool
- Returns:
True if the annotation file exists, False otherwise.
- pat2vec.util.methods_annotation.annot_pat_batch_docs(current_pat_client_idcode, pat_batch, cat, config_obj, t, text_column='body_analysed')[source]
Annotates a batch of patient documents using a MedCAT model.
- Parameters:
current_pat_client_idcode (
str
) – The patient’s ID code.pat_batch (
DataFrame
) – DataFrame containing the documents to be annotated.cat (
Any
) – The loaded MedCAT CAT object.config_obj (
Any
) – The configuration object.t (
Any
) – The tqdm progress bar instance to update.text_column (
str
) – The name of the column in pat_batch containing the text to annotate.
- Return type:
List
[Dict
[str
,Any
]]- Returns:
A list of dictionaries, where each dictionary contains the MedCAT annotation entities for a document.
- pat2vec.util.methods_annotation.multi_annots_to_df_textual_obs(current_pat_client_idcode, pat_batch, multi_annots, config_obj, t, text_column='textualObs', time_column='basicobs_entered', guid_column='basicobs_guid')[source]
Converts MedCAT annotations for textual observations to a DataFrame and saves it.
This function processes a list of annotations, converts them to a structured DataFrame, optionally joins ICD-10/OPCS-4 codes, and saves the result to a patient-specific CSV file.
- Parameters:
current_pat_client_idcode (
str
) – The patient’s ID code.pat_batch (
DataFrame
) – DataFrame of the original documents that were annotated.multi_annots (
List
[Dict
[str
,Any
]]) – The list of annotation dictionaries from MedCAT.config_obj (
Any
) – The configuration object.t (
Any
) – The tqdm progress bar instance to update.text_column (
str
) – The name of the text column in pat_batch.time_column (
str
) – The name of the timestamp column in pat_batch.guid_column (
str
) – The name of the document identifier column in pat_batch.
- Return type:
None
- pat2vec.util.methods_annotation.multi_annots_to_df_reports(current_pat_client_idcode, pat_batch, multi_annots, config_obj, t, text_column='body_analysed', time_column='updatetime', guid_column='basicobs_guid')[source]
Converts MedCAT annotations for reports to a DataFrame and saves it.
This function processes a list of annotations from reports, converts them to a structured DataFrame, optionally joins ICD-10/OPCS-4 codes, and saves the result to a patient-specific CSV file.
- Parameters:
current_pat_client_idcode (
str
) – The patient’s ID code.pat_batch (
DataFrame
) – DataFrame of the original report documents that were annotated.multi_annots (
List
[Dict
[str
,Any
]]) – The list of annotation dictionaries from MedCAT.config_obj (
Any
) – The configuration object.t (
Any
) – The tqdm progress bar instance to update.text_column (
str
) – The name of the text column in pat_batch.time_column (
str
) – The name of the timestamp column in pat_batch.guid_column (
str
) – The name of the document identifier column in pat_batch.
- Return type:
None
- pat2vec.util.methods_annotation.multi_annots_to_df_mct(current_pat_client_idcode, pat_batch, multi_annots, config_obj, t, text_column='observation_valuetext_analysed', time_column='observationdocument_recordeddtm', guid_column='observation_guid')[source]
Converts MedCAT annotations for MCT documents to a DataFrame and saves it.
This function processes a list of annotations from MCT documents, converts them to a structured DataFrame, optionally joins ICD-10/OPCS-4 codes, and saves the result to a patient-specific CSV file.
- Parameters:
current_pat_client_idcode (
str
) – The patient’s ID code.pat_batch (
DataFrame
) – DataFrame of the original MCT documents that were annotated.multi_annots (
List
[Dict
[str
,Any
]]) – The list of annotation dictionaries from MedCAT.config_obj (
Any
) – The configuration object.t (
Any
) – The tqdm progress bar instance to update.text_column (
str
) – The name of the text column in pat_batch.time_column (
str
) – The name of the timestamp column in pat_batch.guid_column (
str
) – The name of the document identifier column in pat_batch.
- Return type:
None
- pat2vec.util.methods_annotation.calculate_pretty_name_count_features(df_copy, suffix='epr')[source]
Calculates count-based features from the ‘pretty_name’ column.
This function groups a DataFrame by ‘pretty_name’ and calculates the count for each name, returning the result as a single-row DataFrame (vector).
- Parameters:
df_copy (
DataFrame
) – The input DataFrame, expected to have a ‘pretty_name’ column.suffix (
str
) – A suffix to append to the feature name.
- Return type:
Optional
[DataFrame
]- Returns:
A single-row DataFrame with counts for each pretty_name, or None if the input DataFrame is empty.