pat2vec.util.post_processing_build_ipw_dataframe

Functions

build_ipw_dataframe([...])

Builds a DataFrame of Individual Patient Window (IPW) records.

pat2vec.util.post_processing_build_ipw_dataframe.build_ipw_dataframe(annot_filter_arguments=None, filter_codes=None, config_obj=None, mode='earliest', include_mct=True, include_textual_obs=True, custom_pat_list=None)[source]

Builds a DataFrame of Individual Patient Window (IPW) records.

This function iterates through a list of patients, finds the relevant “index” record for each one based on specified filters, and compiles these records into a single DataFrame. The index record is typically the first or last occurrence of a specific clinical event (e.g., a diagnosis CUI).

Parameters:
  • annot_filter_arguments (Optional[Dict[str, Any]]) – A dictionary of filters to apply to the annotations before selecting the IPW record. Defaults to None.

  • filter_codes (Optional[List[int]]) – A list of CUI codes to identify the relevant clinical events. Defaults to None.

  • config_obj (Optional[Any]) – The configuration object containing paths and settings. Defaults to None.

  • mode (str) – Determines whether to find the ‘earliest’ or ‘latest’ record for each patient. Defaults to “earliest”.

  • include_mct (bool) – If True, includes annotations from MCT (MRC clinical notes) in the search. Defaults to True.

  • include_textual_obs (bool) – If True, includes annotations from textual observations. Defaults to True.

  • custom_pat_list (Optional[List[str]]) – A specific list of patient IDs to process. If empty, the patient list is derived from the files in the pre_document_batch_path. Defaults to an empty list.

Returns:

A DataFrame where each row represents the IPW record for a

patient, containing details of the index event.

Return type:

pd.DataFrame