pat2vec.pat2vec_main_methods.main_batch
Functions
|
Orchestrates the feature extraction process for a single patient within a specific time window. |
- pat2vec.pat2vec_main_methods.main_batch.main_batch(current_pat_client_id_code, target_date_range, batches=None, config_obj=None, stripped_list_start=None, t=None, cohort_searcher_with_terms_and_search=None, cat=None)[source]
Orchestrates the feature extraction process for a single patient within a specific time window.
This function serves as the main entry point for processing a patient’s data in batch mode. It iterates through a list of predefined feature configurations. For each feature enabled in config_obj.main_options, it calls the corresponding get_* function, passing the pre-fetched data from the batches dictionary. The resulting feature DataFrames are concatenated into a single feature vector for the given patient and time slice.
The final feature vector is saved as a CSV file to a specified directory, effectively creating a time-slice representation of the patient’s state.
- Parameters:
current_pat_client_id_code (str) – The unique identifier for the patient being processed.
target_date_range (tuple) – A tuple representing the specific time window (e.g., (YYYY, MM, DD)) for which to generate the feature vector.
batches (dict[str, pd.DataFrame], optional) – A dictionary containing all pre-fetched data batches for the patient, keyed by batch name (e.g., ‘batch_demo’).
config_obj (object, optional) – A configuration object containing settings like main_options, paths, and verbosity.
stripped_list_start (list, optional) – A list of patient IDs that have already been processed to avoid redundant computation.
t (object, optional) – A tqdm progress bar instance for updating progress.
cohort_searcher_with_terms_and_search (callable, optional) – A function to query the data source, used by some feature extraction methods in non-batch mode.
cat (object, optional) – A MedCAT instance for clinical text annotation. Required if any annotation options are enabled.
- Raises:
ValueError – If config_obj, cohort_searcher_with_terms_and_search, t, or cat (when required) are not provided.
- Side Effects:
Writes a CSV file containing the patient’s feature vector for the specified time slice.
Updates the tqdm progress bar t.