pat2vec

Getting Started

  • Welcome to the pat2vec Wiki!
  • Installation and Setup Guide
  • Comprehensive Configuration Guide for pat2vec
  • Usage

Guides & Examples

  • Blood Test Data Search
    • Overview
    • Data Fields
    • Basic Usage
      • 1. Initialize Pat2vec
      • 2. Search Blood Test Data
    • Advanced Parameters
      • Date Range Filtering
      • Custom Search Filters
      • Alternative Field Mapping
    • Function Parameters Reference
    • Working with Results
      • Basic Data Exploration
      • Data Processing Examples
    • Error Handling
    • Integration with Other Pat2vec Methods
    • Best Practices
    • Troubleshooting

Project Information

  • 📊 Diagrams
  • Frequently Asked Questions (FAQ)
  • Contributing to pat2vec

API Reference

  • API Reference
pat2vec
  • Blood Test Data Search
  • View page source

Blood Test Data Search

The pat2vec library provides comprehensive functionality for searching and retrieving blood test data from clinical records. This section explains how to use the blood test search methods to extract laboratory results for your patient cohorts.

Overview

The blood test search functionality allows you to:

  • Retrieve laboratory results for specific patients or patient cohorts

  • Filter results by date ranges

  • Extract standardized blood test measurements

  • Access both numeric values and metadata

Data Fields

When searching for blood test data, the following fields are retrieved:

Field

Description

client_idcode

Unique patient identifier

basicobs_itemname_analysed

Name/type of the blood test (e.g., “Hemoglobin”, “Creatinine”)

basicobs_value_numeric

Numeric result value

basicobs_entered

Date and time when the result was recorded

clientvisit_serviceguid

Unique identifier for the clinical visit

updatetime

Timestamp when the record was last updated

Basic Usage

1. Initialize Pat2vec

First, set up your pat2vec configuration and initialize the main object:

import pat2vec
from pat2vec.util.config_pat2vec import config_class
from pat2vec.main_pat2vec import main

# Configure pat2vec
config_obj = config_class(
    testing=False,  # Set to False for production use
    verbosity=9,
    proj_name='blood_analysis_project'
)

# Initialize pat2vec object
pat2vec_obj = main(config_obj=config_obj, cogstack=True)

2. Search Blood Test Data

# Define patient IDs
patient_ids = ['PATIENT001', 'PATIENT002', 'PATIENT003']

# Search for blood test data
df_bloods = pat2vec.pat2vec_get_methods.get_method_bloods.search_bloods_data(
    cohort_searcher_with_terms_and_search=pat2vec_obj.cohort_searcher_with_terms_and_search,
    client_id_codes=patient_ids
)

# View the results
print(df_bloods.head())

Advanced Parameters

Date Range Filtering

You can specify custom date ranges to focus on specific time periods:

df_bloods = pat2vec.pat2vec_get_methods.get_method_bloods.search_bloods_data(
    cohort_searcher_with_terms_and_search=pat2vec_obj.cohort_searcher_with_terms_and_search,
    client_id_codes=patient_ids,
    start_year=2020,
    start_month=1,
    start_day=1,
    end_year=2023,
    end_month=12,
    end_day=31
)

Custom Search Filters

Add additional search criteria using the additional_custom_search_string parameter:

# Search for specific blood tests only
df_bloods = pat2vec.pat2vec_get_methods.get_method_bloods.search_bloods_data(
    cohort_searcher_with_terms_and_search=pat2vec_obj.cohort_searcher_with_terms_and_search,
    client_id_codes=patient_ids,
    additional_custom_search_string='AND basicobs_itemname_analysed:("Hemoglobin" OR "Creatinine")'
)

Alternative Field Mapping

If your data uses different field names, you can specify them:

df_bloods = pat2vec.pat2vec_get_methods.get_method_bloods.search_bloods_data(
    cohort_searcher_with_terms_and_search=pat2vec_obj.cohort_searcher_with_terms_and_search,
    client_id_codes=patient_ids,
    client_idcode_name="patient_id.keyword",  # Different patient ID field
    bloods_time_field="test_date"             # Different timestamp field
)

Function Parameters Reference

Parameter

Type

Default

Description

cohort_searcher_with_terms_and_search

callable

None

Required. The pat2vec search function

client_id_codes

str or list

None

Required. Patient ID(s) to search for

client_idcode_name

str

“client_idcode.keyword”

Field name for patient IDs

bloods_time_field

str

“basicobs_entered”

Field name for timestamps

start_year

int

1995

Start year for date range

start_month

int

1

Start month for date range

start_day

int

1

Start day for date range

end_year

int

2025

End year for date range

end_month

int

12

End month for date range

end_day

int

12

End day for date range

additional_custom_search_string

str

None

Additional Elasticsearch query string

Working with Results

Basic Data Exploration

# Check the shape of results
print(f"Retrieved {len(df_bloods)} blood test records")

# View unique test types
print("Available blood tests:")
print(df_bloods['basicobs_itemname_analysed'].unique())

# Check date range of results
print(f"Date range: {df_bloods['basicobs_entered'].min()} to {df_bloods['basicobs_entered'].max()}")

Data Processing Examples

# Convert date strings to datetime
df_bloods['basicobs_entered'] = pd.to_datetime(df_bloods['basicobs_entered'])

# Filter for specific test types
hemoglobin_results = df_bloods[
    df_bloods['basicobs_itemname_analysed'].str.contains('Hemoglobin', na=False)
]

# Group by patient and test type
patient_summary = df_bloods.groupby(['client_idcode', 'basicobs_itemname_analysed']).agg({
    'basicobs_value_numeric': ['count', 'mean', 'std'],
    'basicobs_entered': ['min', 'max']
}).round(2)

Error Handling

The function includes built-in validation and will raise helpful error messages:

try:
    df_bloods = pat2vec.pat2vec_get_methods.get_method_bloods.search_bloods_data(
        cohort_searcher_with_terms_and_search=pat2vec_obj.cohort_searcher_with_terms_and_search,
        client_id_codes=patient_ids
    )
except ValueError as e:
    print(f"Search error: {e}")

Integration with Other Pat2vec Methods

Blood test data can be combined with other clinical data types:

# Get multiple data types for the same patients
df_demographics = pat2vec.pat2vec_get_methods.get_method_demo.search_demographics(
    cohort_searcher_with_terms_and_search=pat2vec_obj.cohort_searcher_with_terms_and_search,
    client_id_codes=patient_ids
)

df_drugs = pat2vec.pat2vec_get_methods.get_method_drugs.search_drug_orders(
    cohort_searcher_with_terms_and_search=pat2vec_obj.cohort_searcher_with_terms_and_search,
    client_id_codes=patient_ids
)

# Combine datasets for comprehensive patient profiles
# (Further processing would depend on your specific analysis needs)

Best Practices

  1. Date Range Optimization: Use appropriate date ranges to avoid retrieving unnecessary historical data

  2. Batch Processing: For large cohorts, process patients in smaller batches to manage memory usage

  3. Data Validation: Always check the retrieved data for completeness and expected ranges

  4. Field Mapping: Verify that field names match your Elasticsearch index schema

  5. Error Handling: Implement proper error handling for production use

Troubleshooting

Common Issues:

  • No results returned: Check patient IDs exist in the database and date ranges are appropriate

  • Field not found errors: Verify field names match your Elasticsearch schema

  • Memory issues: Reduce date ranges or process smaller patient batches

  • Timeout errors: Consider implementing retry logic for large queries

Previous Next

© Copyright 2024, Samora Hunter.

Built with Sphinx using a theme provided by Read the Docs.