pat2vec.util.methods_post_get

Functions

check_csv_files_in_directory(directory[, ...])

Recursively checks the integrity of all CSV files in a directory.

check_csv_integrity(file_path[, verbosity, ...])

Checks the integrity of a single CSV file.

copy_project_folders_with_substring_match(...)

Copies project subfolders that match given substrings to a new versioned directory.

retrieve_pat_annotations(...[, config_obj])

Concatenates EPR and MCT annotation data for a single patient.

pat2vec.util.methods_post_get.retrieve_pat_annotations(current_pat_client_idcode, config_obj=None)[source]

Concatenates EPR and MCT annotation data for a single patient.

This function reads a patient’s annotation data from two separate CSV files—one for EPR documents and one for MCT documents. It then standardizes the timestamp column and concatenates them into a single DataFrame.

Parameters:
  • current_pat_client_idcode (str) – The client ID code for the patient.

  • config_obj (Optional[Any]) – The configuration object containing file paths for EPR and MCT annotation batches.

Return type:

DataFrame

Returns:

A concatenated DataFrame containing annotations from both EPR and MCT sources, with a unified ‘updatetime’ column.

pat2vec.util.methods_post_get.copy_project_folders_with_substring_match(pat2vec_obj, substrings_to_match=None)[source]

Copies project subfolders that match given substrings to a new versioned directory.

This is useful for creating a snapshot or a new version of a project’s outputs before running a new experiment. It finds the next available version number (e.g., my_project_1, my_project_2) and copies only the subfolders whose names contain one of the specified substrings.

Parameters:
  • pat2vec_obj (Any) – The main pat2vec object, containing the config_obj.

  • substrings_to_match (Optional[List[str]]) – A list of substrings to identify which folders to copy (e.g., [‘batches’, ‘annots’]).

Return type:

None

pat2vec.util.methods_post_get.check_csv_integrity(file_path, verbosity=0, delete_broken=False, config_obj=None)[source]

Checks the integrity of a single CSV file.

This function attempts to read a CSV file and performs basic integrity checks, such as ensuring it’s not empty and that key columns do not contain null values. If delete_broken is True, it will remove files that fail these checks.

Parameters:
  • file_path (str) – The path to the CSV file to check.

  • verbosity (int) – The level of detail for logging warnings.

  • delete_broken (bool) – If True, deletes files that fail integrity checks.

  • config_obj (Optional[Any]) – The configuration object, required if delete_broken is True to pass to remove_file_from_paths.

Raises:

UserWarning – If the CSV file is empty, cannot be parsed, a key column contains null values, or the file is not found. These warnings are issued to inform the user of potential data integrity issues.

Return type:

None

pat2vec.util.methods_post_get.check_csv_files_in_directory(directory, verbosity=0, ignore_outputs=True, ignore_output_vectors=True, delete_broken=False)[source]

Recursively checks the integrity of all CSV files in a directory.

This function walks through a directory and its subdirectories, applying check_csv_integrity to every CSV file found. It provides options to ignore certain common output directories.

Parameters:
  • directory (str) – The root directory to start the search from.

  • verbosity (int) – The verbosity level passed to check_csv_integrity.

  • ignore_outputs (bool) – If True, skips any path containing ‘output’.

  • ignore_output_vectors (bool) – If True, skips paths for ‘current_pat_lines_parts’.

  • delete_broken (bool) – If True, deletes files that fail integrity checks.

Return type:

None