pat2vec.util.methods_post_get
Functions
|
Recursively checks the integrity of all CSV files in a directory. |
|
Checks the integrity of a single CSV file. |
Copies project subfolders that match given substrings to a new versioned directory. |
|
|
Concatenates EPR and MCT annotation data for a single patient. |
- pat2vec.util.methods_post_get.retrieve_pat_annotations(current_pat_client_idcode, config_obj=None)[source]
Concatenates EPR and MCT annotation data for a single patient.
This function reads a patient’s annotation data from two separate CSV files—one for EPR documents and one for MCT documents. It then standardizes the timestamp column and concatenates them into a single DataFrame.
- Parameters:
current_pat_client_idcode (
str
) – The client ID code for the patient.config_obj (
Optional
[Any
]) – The configuration object containing file paths for EPR and MCT annotation batches.
- Return type:
DataFrame
- Returns:
A concatenated DataFrame containing annotations from both EPR and MCT sources, with a unified ‘updatetime’ column.
- pat2vec.util.methods_post_get.copy_project_folders_with_substring_match(pat2vec_obj, substrings_to_match=None)[source]
Copies project subfolders that match given substrings to a new versioned directory.
This is useful for creating a snapshot or a new version of a project’s outputs before running a new experiment. It finds the next available version number (e.g., my_project_1, my_project_2) and copies only the subfolders whose names contain one of the specified substrings.
- Parameters:
pat2vec_obj (
Any
) – The main pat2vec object, containing the config_obj.substrings_to_match (
Optional
[List
[str
]]) – A list of substrings to identify which folders to copy (e.g., [‘batches’, ‘annots’]).
- Return type:
None
- pat2vec.util.methods_post_get.check_csv_integrity(file_path, verbosity=0, delete_broken=False, config_obj=None)[source]
Checks the integrity of a single CSV file.
This function attempts to read a CSV file and performs basic integrity checks, such as ensuring it’s not empty and that key columns do not contain null values. If delete_broken is True, it will remove files that fail these checks.
- Parameters:
file_path (
str
) – The path to the CSV file to check.verbosity (
int
) – The level of detail for logging warnings.delete_broken (
bool
) – If True, deletes files that fail integrity checks.config_obj (
Optional
[Any
]) – The configuration object, required if delete_broken is True to pass to remove_file_from_paths.
- Raises:
UserWarning – If the CSV file is empty, cannot be parsed, a key column contains null values, or the file is not found. These warnings are issued to inform the user of potential data integrity issues.
- Return type:
None
- pat2vec.util.methods_post_get.check_csv_files_in_directory(directory, verbosity=0, ignore_outputs=True, ignore_output_vectors=True, delete_broken=False)[source]
Recursively checks the integrity of all CSV files in a directory.
This function walks through a directory and its subdirectories, applying check_csv_integrity to every CSV file found. It provides options to ignore certain common output directories.
- Parameters:
directory (
str
) – The root directory to start the search from.verbosity (
int
) – The verbosity level passed to check_csv_integrity.ignore_outputs (
bool
) – If True, skips any path containing ‘output’.ignore_output_vectors (
bool
) – If True, skips paths for ‘current_pat_lines_parts’.delete_broken (
bool
) – If True, deletes files that fail integrity checks.
- Return type:
None