Welcome to the pat2vec Wiki!

This wiki contains comprehensive documentation for pat2vec.

Overview

pat2vec is a Python-based tool designed to transform raw electronic health records (EHR) into structured, time-series feature vectors. This process makes the data suitable for machine learning tasks, particularly binary classification. It can aggregate data at the patient level or construct detailed longitudinal timelines.

Example Use Cases

1. Patient-Level Aggregation

Compute summary statistics (e.g., the mean of n variables) for each unique patient, resulting in one row per patient. This is ideal for models requiring a single representation per individual.

2. Longitudinal Time Series Construction

Generate a monthly time series for each patient that includes:

Biochemistry results
Demographic attributes
MedCat-derived clinical text annotations

The time series spans up to 25 years retrospectively, aligned to each patient’s diagnosis date, enabling a consistent retrospective view across varying start times.