pat2vec.util.pre_processing
Functions
Calculates the age of clients and appends it as a new column. |
|
|
Filters a demographics DataFrame to keep only the latest record per patient. |
|
Draws n random samples for each unique 'search_term' in a DataFrame. |
|
Searches for documents using a list of terms and returns the results. |
|
Searches for a cohort of patients' demographic data within a date range. |
- pat2vec.util.pre_processing.get_treatment_docs_by_iterative_multi_term_cohort_searcher_no_terms_fuzzy(pat2vec_obj, term_list, overwrite=False, overwrite_search_term=None, append=False, verbose=0, mct=True, textual_obs=True, additional_filters=None, all_fields=False, method='fuzzy', fuzzy=2, slop=1)[source]
Searches for documents using a list of terms and returns the results.
This function takes a list of terms, runs an iterative fuzzy search across multiple data sources (EPR, MCT, Textual Observations), and returns the combined search results as a pandas DataFrame. It also handles saving the results to a CSV file.
- Parameters:
pat2vec_obj (
Any
) – A pat2vec object with necessary attributes set.term_list (
List
[str
]) – A list of terms to search for.overwrite (
bool
) – Whether to overwrite the output file if it already exists.overwrite_search_term (
Optional
[str
]) – A term to override the search terms in term_list. Used for testing.append (
bool
) – Whether to append to the output file if it exists.verbose (
int
) – Verbosity level.mct (
bool
) – If True, includes results from the MCT source.textual_obs (
bool
) – If True, includes results from the textual observations source.additional_filters (
Optional
[List
[str
]]) – A list of additional filters to apply to the search.all_fields (
bool
) – Whether to include and return all fields in the search.method (
str
) – The search method to use (‘fuzzy’, ‘phrase’, ‘exact’). Defaults to “fuzzy”.fuzzy (
int
) – The fuzzy matching tolerance. Defaults to 2.slop (
int
) – The slop for phrase matching. Defaults to 1.
- Return type:
DataFrame
- Returns:
A DataFrame containing the search results.
- pat2vec.util.pre_processing.draw_document_samples(df, n)[source]
Draws n random samples for each unique ‘search_term’ in a DataFrame.
- Parameters:
df (
DataFrame
) – DataFrame containing a ‘search_term’ column.n (
int
) – The number of samples to draw for each unique search term. If a term has fewer than n rows, all its rows are returned.
- Return type:
DataFrame
- Returns:
A new DataFrame containing the sampled entries.
- pat2vec.util.pre_processing.demo_to_latest(demo_df)[source]
Filters a demographics DataFrame to keep only the latest record per patient.
Based on the ‘updatetime’ column, this function finds and returns the most recent entry for each unique ‘client_idcode’.
- Parameters:
demo_df (
DataFrame
) – A DataFrame with patient demographic data, including ‘client_idcode’ and ‘updatetime’ columns.- Return type:
DataFrame
- Returns:
A DataFrame containing only the latest record for each patient.
- pat2vec.util.pre_processing.calculate_age_append(df)[source]
Calculates the age of clients and appends it as a new column.
This function takes a DataFrame with a ‘client_dob’ (date of birth) column, calculates the current age for each client, and adds it as a new ‘age’ column. Rows with invalid or missing ‘client_dob’ are dropped.
- Parameters:
df (
DataFrame
) – DataFrame containing client data with a ‘client_dob’ column.- Return type:
DataFrame
- Returns:
The input DataFrame with an additional ‘age’ column.
- pat2vec.util.pre_processing.search_cohort(patlist, pat2vec_obj, start_year, start_month, start_day, end_year, end_month, end_day, additional_filters=None)[source]
Searches for a cohort of patients’ demographic data within a date range.
- Parameters:
patlist (
List
[str
]) – List of patient IDs to search for.pat2vec_obj (
Any
) – The main pat2vec object with a configured cohort searcher.start_year (
str
) – Start year for the search.start_month (
str
) – Start month for the search.start_day (
str
) – Start day for the search.end_year (
str
) – End year for the search.end_month (
str
) – End month for the search.end_day (
str
) – End day for the search.additional_filters (
Optional
[List
[str
]]) – List of additional filter strings to append to the search query.
- Return type:
DataFrame
- Returns:
A DataFrame containing the demographic data for the specified cohort.
- Raises:
ValueError – If pat2vec_obj is not provided.