pat2vec.util.anonymisation_data_methods

Functions

anonymize_feature_names(df)

Anonymizes DataFrame column names, preserving prefixes and suffixes.

deanonymize_feature_names(...)

De-anonymizes a list of feature names using a provided key.

pat2vec.util.anonymisation_data_methods.anonymize_feature_names(df)[source]

Anonymizes DataFrame column names, preserving prefixes and suffixes.

The ‘core’ part of each feature name is replaced with a unique, generic identifier (e.g., ‘concept_0’). This is useful for sharing data structures without revealing sensitive or proprietary feature names. The function returns both the anonymized DataFrame and a key to reverse the process.

The function identifies prefixes and suffixes from predefined lists. These are sorted by length to ensure the longest possible match is found first, avoiding partial matches (e.g., matching ‘_count’ before ‘_count_present’).

Parameters:

df (DataFrame) – The input pandas DataFrame whose columns need to be anonymized.

Returns:

  • pd.DataFrame: A new DataFrame with anonymized column names.

  • dict: A dictionary mapping anonymized names to their original names, for de-anonymization. Format: {anonymized_name: original_name}.

Return type:

A tuple containing

pat2vec.util.anonymisation_data_methods.deanonymize_feature_names(anonymized_feature_names, anonymization_key)[source]

De-anonymizes a list of feature names using a provided key.

Parameters:
  • anonymized_feature_names (List[str]) – A list of anonymized feature names.

  • anonymization_key (Dict[str, str]) – The dictionary mapping anonymized names back to original names. Format: {anonymized_name: original_name}.

Return type:

List[Optional[str]]

Returns:

A list of the original feature names. If an anonymized name is not found in the key, the corresponding item in the list will be None.