Usage
This guide outlines the steps to run a pat2vec analysis after completing the installation.
1. Finalize Project Setup
Before running an analysis, ensure your project directory is set up correctly. If you used the install_pat2vec.sh script, much of this is done for you.
Populate
credentials.py: In the parent directory of yourpat2vecclone, editcredentials.pywith your Elasticsearch credentials.Add MedCAT Model: Copy your MedCAT model pack (
.zip) into themedcat_modelsdirectory.
Your final directory structure should look like this:
your_project_folder/
├── credentials.py # <-- Populated with your credentials
├── medcat_models/
│ └── your_model.zip # <-- Your MedCAT model pack
├── snomed_methods/ # <-- Cloned helper repository
└── pat2vec/ # <-- This repository
├── notebooks/
│ └── example_usage.ipynb
└── ...
2. Prepare Input Data
Create a CSV file containing your patient cohort. This file must include:
A column named
client_idcodewith unique patient identifiers.Any other relevant columns, such as a diagnosis date for aligning time series data.
Place this file in an accessible location, such as a new data folder inside pat2vec/notebooks/.
3. Configure and Run
The example_usage.ipynb notebook provides a template for running the pipeline.
Open the Notebook: Navigate to
pat2vec/notebooks/and openexample_usage.ipynb.Select the Kernel: Ensure the
pat2vec_envJupyter kernel is active.Configure the Analysis: In the notebook, locate the
config_class. This object controls all parameters for your run. You will need to set:Paths to your input cohort CSV and output directories.
The list of features to extract.
Time windows for data extraction (look-back/look-forward periods).
Run the Pipeline: Execute the cells in the notebook to process your data.
Note: When working with real patient data, ensure the
testingflag in theconfig_classis set toFalse.