pat2vec.util.helper_functions
Functions
|
Deletes all features for a specific patient from the database. |
|
Ensures a database index exists for a given column. |
|
Extracts all occurrences of "NHS" followed by a 10-digit number. |
|
Retrieves all patient features from the configured backend. |
|
Generic helper to retrieve a DataFrame from the database backend. |
Retrieves a unique list of hospital IDs from a list of NHS numbers. |
|
|
Sanitizes a string to be safe for use in a file/directory path. |
|
Saves an annotation DataFrame to the database. |
|
Saves the feature vector(s) for a single patient to the configured backend. |
|
Saves a raw data batch for a patient to the database. |
- pat2vec.util.helper_functions.sanitize_for_path(text)[source]
Sanitizes a string to be safe for use in a file/directory path.
- Return type:
str- Parameters:
text (str)
- pat2vec.util.helper_functions.extract_nhs_numbers(input_string)[source]
Extracts all occurrences of “NHS” followed by a 10-digit number.
The function searches for the pattern “NHS” followed by a 10-digit number, which may contain spaces. It then cleans the extracted numbers by removing any spaces.
- Parameters:
input_string (
str) – The string to search for NHS numbers.- Return type:
List[str]- Returns:
A list of all extracted 10-digit NHS numbers as strings.
Examples
>>> extract_nhs_numbers("NHS 123 456 7890") ['1234567890'] >>> extract_nhs_numbers("NHS 123 456 7890 and NHS 098 765 4321") ['1234567890', '0987654321']
- pat2vec.util.helper_functions.get_search_client_idcode_list_from_nhs_number_list(nhs_numbers, pat2vec_obj)[source]
Retrieves a unique list of hospital IDs from a list of NHS numbers.
This function uses a pat2vec_obj to perform a cohort search against an index (e.g., ‘pims_apps*’) to find the corresponding ‘HospitalID’ for each ‘PatNHSNo’ in the provided list.
- Parameters:
nhs_numbers (
List[str]) – A list of NHS numbers to search for.pat2vec_obj (
Any) – An object with a cohort_searcher_with_terms_and_search method for querying the data source.
- Return type:
List[str]- Returns:
A unique list of hospital IDs found for the given NHS numbers.
- pat2vec.util.helper_functions.ensure_index(connection, table_name, schema_name, column_name, engine_name)[source]
Ensures a database index exists for a given column.
- Return type:
None- Parameters:
connection (Any)
table_name (str)
schema_name (str | None)
column_name (str)
engine_name (str)
- pat2vec.util.helper_functions.clear_patient_features(patient_id, config_obj)[source]
Deletes all features for a specific patient from the database.
- Return type:
None- Parameters:
patient_id (str)
config_obj (Any)
- pat2vec.util.helper_functions.save_patient_features(features_df, patient_id, config_obj, overwrite=True)[source]
Saves the feature vector(s) for a single patient to the configured backend.
If storage_backend is ‘database’, it appends/overwrites the features in a ‘features’ table within a ‘features’ schema.
If storage_backend is ‘file’, it saves the features to a CSV file in the current_pat_lines_path directory, preserving the original behavior.
- Parameters:
features_df (
DataFrame) – The DataFrame containing one or more feature vectors for the patient.patient_id (
str) – The unique identifier for the patient.config_obj (
Any) – The configuration object containing backend settings and paths.overwrite (
bool) – If True, delete existing features for the patient before saving. Defaults to True.
- Raises:
ValueError – If an unknown storage_backend is specified.
Exception – Propagates exceptions from database operations.
- Return type:
None
- pat2vec.util.helper_functions.save_raw_patient_batch(df, patient_id, table_name, config_obj, id_column='client_idcode')[source]
Saves a raw data batch for a patient to the database.
- Parameters:
df (
DataFrame) – The DataFrame containing the raw data.patient_id (
str) – The patient identifier.table_name (
str) – The target table name (without schema prefix).config_obj (
Any) – The configuration object.id_column (
str) – The column name for the patient ID in this table.
- Return type:
None
- pat2vec.util.helper_functions.get_all_features(config_obj)[source]
Retrieves all patient features from the configured backend.
If storage_backend is ‘database’, it reads the entire ‘features’ table.
If storage_backend is ‘file’, it reads and concatenates all individual patient CSV files from the current_pat_lines_path directory.
- Return type:
DataFrame- Parameters:
config_obj (Any)
- pat2vec.util.helper_functions.get_df_from_db(config_obj, schema, table, patient_ids=None, patient_id_column='client_idcode', columns=None)[source]
Generic helper to retrieve a DataFrame from the database backend.
This function handles database connections, dialect-specific table naming (e.g., for SQLite), and filtering by a list of patient IDs.
- Parameters:
config_obj (
Any) – The configuration object.schema (
str) – The database schema name (e.g., ‘raw_data’).table (
str) – The database table name (e.g., ‘raw_drugs’).patient_ids (
Optional[List[str]]) – An optional list of patient IDs to filter the DataFrame.patient_id_column (
str) – The name of the patient ID column.columns (
Optional[List[str]]) – An optional list of columns to select.
- Return type:
DataFrame- Returns:
A pandas DataFrame with the requested data, or an empty DataFrame on error.
- pat2vec.util.helper_functions.save_annotations_to_db(df, patient_id, table_name, config_obj, id_column='client_idcode', schema_name='annotations')[source]
Saves an annotation DataFrame to the database.
- Return type:
None- Parameters:
df (DataFrame)
patient_id (str)
table_name (str)
config_obj (Any)
id_column (str)
schema_name (str)