ml_grid.util.time_series_helper

Functions

add_date_order_sequence_column(df)

Add a sequence column based on the timestamp within each client_idcode group.

max_client_idcode_sequence_length(df)

Calculate the maximum sequence length for client_idcode.

convert_Xy_to_time_series(X, y, max_seq_length)

Convert DataFrame into time series format suitable for training.

Module Contents

ml_grid.util.time_series_helper.add_date_order_sequence_column(df)[source]

Add a sequence column based on the timestamp within each client_idcode group.

Args: df (DataFrame): DataFrame with ‘timestamp’ and ‘client_idcode’ columns.

Returns: DataFrame: DataFrame with added ‘date_order_sequence’ column.

ml_grid.util.time_series_helper.max_client_idcode_sequence_length(df)[source]

Calculate the maximum sequence length for client_idcode.

Args: df (DataFrame): DataFrame with ‘client_idcode’ column.

Returns: int: Maximum sequence length.

ml_grid.util.time_series_helper.convert_Xy_to_time_series(X, y, max_seq_length)[source]

Convert DataFrame into time series format suitable for training.

This function takes a DataFrame with features (X) and a Series with target variable (y), and converts them into a format suitable for training in Keras’s Sequential API.

The function assumes that the DataFrame has a column named ‘client_idcode’ that defines the sequence of data for each patient (patient is a sequence of rows).

The function also assumes that the max length of each patient sequence is the same, which is given by the parameter max_seq_length.

The function creates a list of input patterns (X_list) and a list of target variables (y_list) for each patient. Each input pattern is a NumPy array, and each target variable is a scalar value (i.e. a number).

The function returns a tuple containing (X_array, y_array). X_array is a NumPy array of input patterns, and y_array is a NumPy array of target variables.

Args: X (DataFrame): Features DataFrame. y (Series): Target variable. max_seq_length (int): Maximum sequence length for each patient.

Returns: tuple: Tuple containing X and y in the format suitable for time series training.