pat2vec.util.filter_methods

Functions

apply_bloods_data_type_filter(config_obj, ...)

Applies data type filters to a DataFrame of bloods data.

apply_data_type_epr_docs_filters(config_obj, ...)

Applies data type filters to a DataFrame of EPR documents.

apply_data_type_mct_docs_filters(config_obj, ...)

Applies data type filters to a DataFrame of MCT documents.

filter_dataframe_by_fuzzy_terms(df, ...[, ...])

Filters a DataFrame by fuzzy matching terms in a specified column.

pat2vec.util.filter_methods.filter_dataframe_by_fuzzy_terms(df, filter_term_list, column_name='document_description', verbose=0)[source]

Filters a DataFrame by fuzzy matching terms in a specified column.

This function iterates through a list of terms and finds the best fuzzy matches in a DataFrame column. It returns a new DataFrame containing only the rows that have a match score above a certain threshold (80).

Parameters:
  • df (DataFrame) – The DataFrame to filter.

  • filter_term_list (List[str]) – A list of terms to search for.

  • column_name (str) – The name of the column to perform the fuzzy match on.

  • verbose (int) – Verbosity level for logging.

Return type:

DataFrame

Returns:

A new DataFrame containing only the rows with fuzzy-matched terms.

pat2vec.util.filter_methods.apply_data_type_epr_docs_filters(config_obj, batch_target)[source]

Applies data type filters to a DataFrame of EPR documents.

This function filters a DataFrame based on rules defined in the config_obj. It can apply fuzzy term matching on the ‘document_description’ column and also count occurrences of regex patterns in the ‘body_analysed’ column, adding the counts as new columns.

Parameters:
  • config_obj (Any) – A configuration object containing filter settings.

  • batch_target (DataFrame) – The DataFrame of EPR documents to be filtered.

Return type:

DataFrame

Returns:

The filtered DataFrame.

pat2vec.util.filter_methods.apply_bloods_data_type_filter(config_obj, batch_target)[source]

Applies data type filters to a DataFrame of bloods data.

This function filters a DataFrame based on fuzzy term matching against the ‘basicobs_itemname_analysed’ column, using filter terms defined in the config_obj.

Parameters:
  • config_obj (Any) – A configuration object containing filter settings.

  • batch_target (DataFrame) – The DataFrame of bloods data to be filtered.

Return type:

DataFrame

Returns:

The filtered DataFrame.

pat2vec.util.filter_methods.apply_data_type_mct_docs_filters(config_obj, batch_target)[source]

Applies data type filters to a DataFrame of MCT documents.

This function filters a DataFrame based on rules defined in the config_obj. It can apply fuzzy term matching on the ‘document_description’ column and also count occurrences of regex patterns in the ‘body_analysed’ column, adding the counts as new columns.

Parameters:
  • config_obj (Any) – A configuration object containing filter settings.

  • batch_target (DataFrame) – The DataFrame of MCT documents to be filtered.

Return type:

DataFrame

Returns:

The filtered DataFrame.