ml_grid.util.time_series_helper
Functions
Add a sequence column based on the timestamp within each client_idcode group. |
|
Calculate the maximum sequence length for client_idcode. |
|
|
Convert DataFrame into time series format suitable for training. |
Module Contents
- ml_grid.util.time_series_helper.add_date_order_sequence_column(df)[source]
Add a sequence column based on the timestamp within each client_idcode group.
Args: df (DataFrame): DataFrame with ‘timestamp’ and ‘client_idcode’ columns.
Returns: DataFrame: DataFrame with added ‘date_order_sequence’ column.
- ml_grid.util.time_series_helper.max_client_idcode_sequence_length(df)[source]
Calculate the maximum sequence length for client_idcode.
Args: df (DataFrame): DataFrame with ‘client_idcode’ column.
Returns: int: Maximum sequence length.
- ml_grid.util.time_series_helper.convert_Xy_to_time_series(X, y, max_seq_length)[source]
Convert DataFrame into time series format suitable for training.
This function takes a DataFrame with features (X) and a Series with target variable (y), and converts them into a format suitable for training in Keras’s Sequential API.
The function assumes that the DataFrame has a column named ‘client_idcode’ that defines the sequence of data for each patient (patient is a sequence of rows).
The function also assumes that the max length of each patient sequence is the same, which is given by the parameter max_seq_length.
The function creates a list of input patterns (X_list) and a list of target variables (y_list) for each patient. Each input pattern is a NumPy array, and each target variable is a scalar value (i.e. a number).
The function returns a tuple containing (X_array, y_array). X_array is a NumPy array of input patterns, and y_array is a NumPy array of target variables.
Args: X (DataFrame): Features DataFrame. y (Series): Target variable. max_seq_length (int): Maximum sequence length for each patient.
Returns: tuple: Tuple containing X and y in the format suitable for time series training.