Usage
This guide outlines the steps to run a pat2vec
analysis after completing the installation.
1. Finalize Project Setup
Before running an analysis, ensure your project directory is set up correctly. If you used the install_pat2vec.sh
script, much of this is done for you.
Populate
credentials.py
: In the parent directory of yourpat2vec
clone, editcredentials.py
with your Elasticsearch credentials.Add MedCAT Model: Copy your MedCAT model pack (
.zip
) into themedcat_models
directory.
Your final directory structure should look like this:
your_project_folder/
├── credentials.py # <-- Populated with your credentials
├── medcat_models/
│ └── your_model.zip # <-- Your MedCAT model pack
├── snomed_methods/ # <-- Cloned helper repository
└── pat2vec/ # <-- This repository
├── notebooks/
│ └── example_usage.ipynb
└── ...
2. Prepare Input Data
Create a CSV file containing your patient cohort. This file must include:
A column named
client_idcode
with unique patient identifiers.Any other relevant columns, such as a diagnosis date for aligning time series data.
Place this file in an accessible location, such as a new data
folder inside pat2vec/notebooks/
.
3. Configure and Run
The example_usage.ipynb
notebook provides a template for running the pipeline.
Open the Notebook: Navigate to
pat2vec/notebooks/
and openexample_usage.ipynb
.Select the Kernel: Ensure the
pat2vec_env
Jupyter kernel is active.Configure the Analysis: In the notebook, locate the
config_class
. This object controls all parameters for your run. You will need to set:Paths to your input cohort CSV and output directories.
The list of features to extract.
Time windows for data extraction (look-back/look-forward periods).
Run the Pipeline: Execute the cells in the notebook to process your data.
Note: When working with real patient data, ensure the
testing
flag in theconfig_class
is set toFalse
.