Welcome to the pat2vec Wiki!

This wiki contains comprehensive documentation for pat2vec.

Overview

pat2vec is a Python-based tool designed to transform raw electronic health records (EHR) into structured, time-series feature vectors. This process makes the data suitable for machine learning tasks, particularly binary classification. It can aggregate data at the patient level or construct detailed longitudinal timelines.

Example Use Cases

1. Patient-Level Aggregation

Compute summary statistics (e.g., the mean of n variables) for each unique patient, resulting in one row per patient. This is ideal for models requiring a single representation per individual.

2. Longitudinal Time Series Construction

Generate a monthly time series for each patient that includes:

  • Biochemistry results

  • Demographic attributes

  • MedCat-derived clinical text annotations

The time series spans up to 25 years retrospectively, aligned to each patient’s diagnosis date, enabling a consistent retrospective view across varying start times.