pat2vec.util.methods_get
Functions
|
Adds a new column with a time offset from a starting datetime column. |
|
Builds a dictionary mapping patient IDs to (start, end) datetime tuples. |
|
Converts a date string in 'YYYY-MM-DD' format to a datetime object. |
|
Converts a timestamp string to a (year, month) tuple. |
|
Creates folders for each patient in the specified paths. |
|
Creates folders locally or remotely based on the configuration. |
|
Creates folders for a single patient in the specified paths. |
|
Creates local project directories for storing intermediate files. |
|
Creates remote project directories for storing intermediate files via SFTP. |
|
Saves data to a file using pickle, either locally or remotely via SFTP. |
Creates a one-hot encoded date vector for a specific target date. |
|
|
Creates a one-hot encoded date vector for a target date. |
|
Checks if a file or directory exists, either locally or remotely. |
|
Filters a list of patients to exclude those already processed. |
|
Creates an empty DataFrame with one-hot encoded date columns. |
Identifies and returns the GPU with the most available free memory. |
|
|
Lists the contents of a directory, either locally or remotely via SFTP. |
|
Reads CSV data from a file, handling both local and remote paths. |
|
Reads a remote CSV file via SFTP and returns a pandas DataFrame. |
|
Checks if a file or directory exists on a remote SFTP server. |
Test the function with various datetime formats |
|
|
Updates a tqdm progress bar with formatted information about the current processing state. |
|
Writes CSV data to a file either locally or remotely. |
|
Writes a pandas DataFrame to a remote file via SFTP. |
- pat2vec.util.methods_get.list_dir_wrapper(path, config_obj=None)[source]
Lists the contents of a directory, either locally or remotely via SFTP.
This function acts as a wrapper around os.listdir and sftp.listdir to provide a consistent interface for listing directory contents based on the remote_dump setting in the configuration object.
- Parameters:
path (
str) – The path to the directory to list.config_obj (
Optional[Any]) – The configuration object containing SFTP credentials and settings if remote_dump is True.
- Return type:
List[str]- Returns:
A list of filenames in the specified directory.
- pat2vec.util.methods_get.convert_timestamp_to_tuple(timestamp)[source]
Converts a timestamp string to a (year, month) tuple.
- Parameters:
timestamp (
str) – The timestamp string to convert, expected in the format %Y-%m-%dT%H:%M:%S.%f%z.- Return type:
Tuple[int,int]- Returns:
A tuple containing the year and month as integers.
- pat2vec.util.methods_get.enum_target_date_vector(target_date_range, current_pat_client_id_code, config_obj)[source]
Creates a one-hot encoded date vector for a target date.
- Parameters:
target_date_range (
Tuple[int,int,int]) – A tuple of (year, month, day) for the target date.current_pat_client_id_code (
str) – The patient’s ID.config_obj (
Any) – The configuration object.
- Return type:
DataFrame- Returns:
A single-row DataFrame with a one-hot encoded column for the target date.
- pat2vec.util.methods_get.enum_exact_target_date_vector(target_date_range, current_pat_client_id_code, config_obj)[source]
Creates a one-hot encoded date vector for a specific target date.
- Parameters:
target_date_range (
Tuple[int,int,int]) – A tuple of (year, month, day) for the target date.current_pat_client_id_code (
str) – The patient’s ID.config_obj (
Any) – The configuration object (currently unused).
- Return type:
DataFrame- Returns:
A single-row DataFrame with a one-hot encoded column for the target date.
- pat2vec.util.methods_get.dump_results(file_data, path, config_obj=None)[source]
Saves data to a file using pickle, either locally or remotely via SFTP.
- Parameters:
file_data (
Any) – The Python object to be pickled.path (
str) – The destination file path.config_obj (
Optional[Any]) – The configuration object containing SFTP credentials and settings if remote_dump is True.
- Return type:
None
- pat2vec.util.methods_get.update_pbar(current_pat_client_id_code, start_time, stage_int, stage_str, t, config_obj, skipped_counter=None, **n_docs_to_annotate)[source]
Updates a tqdm progress bar with formatted information about the current processing state.
This function dynamically sets the description and color of a tqdm progress bar to reflect the current patient, processing stage, and execution time. The color changes to indicate slow performance if the elapsed time exceeds predefined thresholds.
- Parameters:
current_pat_client_id_code (
str) – The identifier of the patient currently being processed.start_time (
datetime) – The start time of the current operation. Note: This parameter is currently overwritten by config_obj.start_time.stage_int (
int) – An integer representing the processing stage. Note: This parameter is currently unused.stage_str (
str) – A string describing the current processing stage (e.g., “demo”, “annotating”).t (
tqdm) – The tqdm progress bar instance to update.config_obj (
Any) – A configuration object containing settings like start_time, multi_process, and various slow_execution_threshold values.skipped_counter (
Union[int,Any,None]) – A counter for the number of skipped items. Can be a standard integer or a multiprocessing-safe value. Defaults to None.**n_docs_to_annotate (
Any) – Arbitrary keyword arguments that are displayed at the end of the progress bar description. Useful for showing counts like the number of documents to annotate.
- Return type:
None
- pat2vec.util.methods_get.get_free_gpu()[source]
Identifies and returns the GPU with the most available free memory.
This function executes the nvidia-smi command-line utility to query the GPU memory usage.
- Return type:
Tuple[int,str]
- pat2vec.util.methods_get.convert_date(date_string)[source]
Converts a date string in ‘YYYY-MM-DD’ format to a datetime object.
- Parameters:
date_string (
str) – The string to convert, which may include a time part (e.g., ‘YYYY-MM-DDTHH:MM:SS’).- Return type:
datetime- Returns:
A datetime object representing the date part of the string.
- pat2vec.util.methods_get.write_csv_wrapper(path, csv_file_data=None, config_obj=None)[source]
Writes CSV data to a file either locally or remotely.
- Parameters:
path (
str) – The path to the destination CSV file.csv_file_data (
Optional[DataFrame]) – The DataFrame to write.config_obj (
Optional[Any]) – An object containing configuration settings, including ‘remote_dump’.
- Return type:
None
- pat2vec.util.methods_get.read_remote(path, config_obj=None)[source]
Reads a remote CSV file via SFTP and returns a pandas DataFrame.
- Parameters:
path (
str) – The remote path of the CSV file to read.config_obj (
Optional[Any]) – An object containing configuration details.
- Return type:
DataFrame- Returns:
The DataFrame containing the data read from the remote CSV file.
- pat2vec.util.methods_get.read_csv_wrapper(path, config_obj=None)[source]
Reads CSV data from a file, handling both local and remote paths.
This function is a wrapper that calls either pd.read_csv for local files or read_remote for SFTP paths, based on the remote_dump flag in the configuration.
- Parameters:
path (
str) – The path to the CSV file (local or remote).config_obj (
Optional[Any]) – An object containing configuration settings, including ‘remote_dump’.
- Return type:
DataFrame- Returns:
The DataFrame containing the data read from the CSV file.
- pat2vec.util.methods_get.create_local_folders(config_obj=None)[source]
Creates local project directories for storing intermediate files.
- Parameters:
config_obj (
Optional[Any]) – The configuration object containing root_path and proj_name.- Return type:
None
- pat2vec.util.methods_get.create_remote_folders(config_obj=None)[source]
Creates remote project directories for storing intermediate files via SFTP.
- Parameters:
config_obj (
Optional[Any]) – An object containing configuration details like root_path, proj_name, and SFTP credentials.- Raises:
ValueError – If config_obj is not provided.
- Return type:
None
- pat2vec.util.methods_get.create_folders_annot_csv_wrapper(config_obj=None)[source]
Creates folders locally or remotely based on the configuration.
This function is a wrapper that calls either create_local_folders or create_remote_folders based on the remote_dump flag in the config.
- Parameters:
config_obj (
Optional[Any]) – The configuration object.- Return type:
None
- pat2vec.util.methods_get.get_empty_date_vector(config_obj)[source]
Creates an empty DataFrame with one-hot encoded date columns.
The columns are generated based on the time window settings in the configuration object.
- Parameters:
config_obj (
Any) – The configuration object with time window settings.- Return type:
DataFrame- Returns:
A single-row DataFrame with columns for each date in the time window, initialized to 0.0.
- pat2vec.util.methods_get.sftp_exists(path, config_obj)[source]
Checks if a file or directory exists on a remote SFTP server.
- Parameters:
path (
str) – The remote path to check.config_obj (
Any) – The configuration object containing SFTP credentials and settings.
- Return type:
bool- Returns:
True if the path exists, False otherwise.
- pat2vec.util.methods_get.exist_check(path, config_obj=None)[source]
Checks if a file or directory exists, either locally or remotely.
This is a wrapper around os.path.exists and sftp_exists that checks the remote_dump flag in the configuration object.
- Parameters:
path (
str) – The path to check.config_obj (
Optional[Any]) – The configuration object.
- Return type:
bool- Returns:
True if the path exists, False otherwise.
- pat2vec.util.methods_get.filter_stripped_list(stripped_list, config_obj=None)[source]
Filters a list of patients to exclude those already processed.
Checks if a patient’s output directory contains at least n_pat_lines files, indicating that processing for that patient is complete.
- Parameters:
stripped_list (
List[str]) – The initial list of patient IDs to process.config_obj (
Optional[Any]) – The configuration object containing paths and settings.
- Returns:
the filtered list of patients to be processed, and the original filtered list (for reference).
- Return type:
A tuple containing two lists
- pat2vec.util.methods_get.create_folders(all_patient_list, config_obj=None)[source]
Creates folders for each patient in the specified paths.
- Parameters:
all_patient_list (
List[str]) – List of patient IDs.config_obj (
Optional[Any]) – Configuration object containing paths and verbosity level.
- Return type:
None
- pat2vec.util.methods_get.create_folders_for_pat(patient_id, config_obj=None)[source]
Creates folders for a single patient in the specified paths.
- Parameters:
patient_id (
str) – The patient’s ID.config_obj (
Optional[Any]) – Configuration object containing paths and verbosity level.
- Return type:
None
- pat2vec.util.methods_get.add_offset_column(dataframe, start_column_name, offset_column_name, time_offset, verbose=1)[source]
Adds a new column with a time offset from a starting datetime column.
Handles multiple datetime formats flexibly.
- Parameters:
dataframe (
DataFrame) – The input DataFrame.start_column_name (
str) – The name of the column with the starting datetime.offset_column_name (
str) – The name for the new column to be created.time_offset (
Union[timedelta,Any]) – The time period offset to add to the start time.verbose (
int) – Verbosity level (0=silent, 1=basic, 2=detailed).
- Return type:
DataFrame- Returns:
The modified DataFrame with the new offset column.
- pat2vec.util.methods_get.test_datetime_formats()[source]
Test the function with various datetime formats
- pat2vec.util.methods_get.build_patient_dict(dataframe, patient_id_column, start_column, end_column)[source]
Builds a dictionary mapping patient IDs to (start, end) datetime tuples.
- Parameters:
dataframe (
DataFrame) – The input DataFrame.patient_id_column (
str) – The name of the column containing patient IDs.start_column (
str) – The name of the column containing start datetimes.end_column (
str) – The name of the column containing end datetimes.
- Return type:
Dict[str,Tuple[datetime,datetime]]- Returns:
A dictionary where keys are patient IDs and values are (start, end) tuples.
- pat2vec.util.methods_get.write_remote(path, csv_file, config_obj=None)[source]
Writes a pandas DataFrame to a remote file via SFTP.
- Parameters:
path – The remote path where the file should be written.
csv_file – The DataFrame to be written.
config_obj – An object containing SFTP configuration details.
- Raises:
ValueError – If config_obj is not provided.