pat2vec.util.filter_methods
Functions
|
Applies data type filters to a DataFrame of bloods data. |
|
Applies data type filters to a DataFrame of EPR documents. |
|
Applies data type filters to a DataFrame of MCT documents. |
|
Filters a DataFrame by fuzzy matching terms in a specified column. |
- pat2vec.util.filter_methods.filter_dataframe_by_fuzzy_terms(df, filter_term_list, column_name='document_description', verbose=0)[source]
Filters a DataFrame by fuzzy matching terms in a specified column.
This function iterates through a list of terms and finds the best fuzzy matches in a DataFrame column. It returns a new DataFrame containing only the rows that have a match score above a certain threshold (80).
- Parameters:
df (
DataFrame
) – The DataFrame to filter.filter_term_list (
List
[str
]) – A list of terms to search for.column_name (
str
) – The name of the column to perform the fuzzy match on.verbose (
int
) – Verbosity level for logging.
- Return type:
DataFrame
- Returns:
A new DataFrame containing only the rows with fuzzy-matched terms.
- pat2vec.util.filter_methods.apply_data_type_epr_docs_filters(config_obj, batch_target)[source]
Applies data type filters to a DataFrame of EPR documents.
This function filters a DataFrame based on rules defined in the config_obj. It can apply fuzzy term matching on the ‘document_description’ column and also count occurrences of regex patterns in the ‘body_analysed’ column, adding the counts as new columns.
- Parameters:
config_obj (
Any
) – A configuration object containing filter settings.batch_target (
DataFrame
) – The DataFrame of EPR documents to be filtered.
- Return type:
DataFrame
- Returns:
The filtered DataFrame.
- pat2vec.util.filter_methods.apply_bloods_data_type_filter(config_obj, batch_target)[source]
Applies data type filters to a DataFrame of bloods data.
This function filters a DataFrame based on fuzzy term matching against the ‘basicobs_itemname_analysed’ column, using filter terms defined in the config_obj.
- Parameters:
config_obj (
Any
) – A configuration object containing filter settings.batch_target (
DataFrame
) – The DataFrame of bloods data to be filtered.
- Return type:
DataFrame
- Returns:
The filtered DataFrame.
- pat2vec.util.filter_methods.apply_data_type_mct_docs_filters(config_obj, batch_target)[source]
Applies data type filters to a DataFrame of MCT documents.
This function filters a DataFrame based on rules defined in the config_obj. It can apply fuzzy term matching on the ‘document_description’ column and also count occurrences of regex patterns in the ‘body_analysed’ column, adding the counts as new columns.
- Parameters:
config_obj (
Any
) – A configuration object containing filter settings.batch_target (
DataFrame
) – The DataFrame of MCT documents to be filtered.
- Return type:
DataFrame
- Returns:
The filtered DataFrame.