pat2vec.util.methods_annotation_regex

Functions

append_regex_term_counts(df, terms[, ...])

Counts occurrences of regex patterns in a DataFrame's text column.

pat2vec.util.methods_annotation_regex.append_regex_term_counts(df, terms, text_column='body_analysed', debug=False)[source]

Counts occurrences of regex patterns in a DataFrame’s text column.

For each term (regex pattern) in the terms list, this function counts its case-insensitive occurrences in each row of the specified text_column. A new column is added to the DataFrame for each term, containing the count.

Parameters:
  • df (DataFrame) – The DataFrame to process.

  • terms (List[str]) – A list of regex patterns to search for.

  • text_column (str) – The name of the column containing the text to search.

  • debug (bool) – If True, prints debugging information about the DataFrame.

Return type:

DataFrame

Returns:

The original DataFrame with new columns for the counts of each term.