GRASP: Revolutionizing Disease Risk Prediction with Deep Learning

In the realm of healthcare, electronic health records (EHRs) have emerged as crucial instruments for forecasting disease risks, strategizing screenings, and crafting preventive measures. However, the application of models derived from structured EHR data often encounters obstacles when utilized beyond their original healthcare systems. Differences in coding practices, clinical workflows, and reporting standards impede their adaptability, even when data are standardized using common frameworks like the OMOP Common Data Model. The process of mapping to these models is labor-intensive and fails to completely resolve semantic discrepancies.

To address these issues, a groundbreaking deep learning framework known as Generalizable Risk Assessment with Semantic Projection (GRASP) has been developed. GRASP integrates medical concepts into a unified semantic space using large language model (LLM) representations and employs a transformer network to evaluate longitudinal patient histories, predicting multiple disease risks across various healthcare systems.

Limitations of Traditional EHR Models

Traditional risk prediction models that rely on structured EHR data are heavily dependent on the coding of diagnoses, procedures, and medications. Even within a unified data model, clinical concepts may be represented by different codes, and some concepts might be rare or unique to specific settings. Models based on fixed ontologies or co-occurrence statistics capture only partial similarities and struggle to adapt as coding systems evolve.

These challenges are particularly daunting for smaller organizations that lack the resources to develop and maintain local prediction models. Comprehensive data harmonization is expensive, and retraining models for each new setting is impractical. GRASP bridges this gap by emphasizing semantic alignment rather than explicit code mapping, facilitating data transfer by representing medical concepts based on their meaning rather than their local coding frequency.

Semantic Projection in Risk Modeling

GRASP employs semantic embeddings generated from clinical concept descriptions using an LLM, specifically OpenAI's text-embedding-3-large. These embeddings are stored in a lookup table, allowing patient histories to be encoded without repeated LLM calls during inference. This method does not involve individual-level data, ensuring secure use without sharing patient records.

Patient histories, represented by embedded concepts, are processed through a multi-layer transformer neural network. The model predicts time-to-event risk for 22 endpoints, including 21 diseases and all-cause mortality, using predictors like age, sex, and observed OMOP-mapped concepts. A two-year washout period is applied to minimize the influence of conditions closely related to predicted outcomes.

Validation and Model Transfer

GRASP was developed using UK Biobank data from 391,921 individuals and evaluated externally in FinnGen and the Mount Sinai Health System dataset. In UK Biobank cross-validation, GRASP, random-embedding transformers, and XGBoost all outperformed an age-and-sex-only baseline, with GRASP showing significant advantages in external data transfer.

When applied to external datasets without additional training, GRASP outperformed comparable transformers with random embeddings and XGBoost. Statistically significant improvements were noted for several outcomes, particularly asthma, chronic kidney disease, and heart failure.

Cross-System Transfer and Challenges

GRASP demonstrated effective transfer across coding systems without explicit ontology mapping. When trained in UK Biobank and evaluated in Mount Sinai using OMOP-mapped condition concepts, GRASP showed notable improvements over baseline models. However, when evaluated using only ICD-10-CM codes, performance was reduced for some outcomes.

Further analyses confirmed GRASP's superior performance with smaller training samples and its resilience to changes in concept descriptions. Despite its strengths, GRASP faces limitations, including the absence of detailed temporal sequencing, evaluation limited to high-income settings, potential biases from LLMs, and the need for recalibration in new populations.

GRASP exemplifies how language-based semantic representations of medical concepts can enhance the portability of EHR-based risk prediction across diverse healthcare systems and coding frameworks. By minimizing reliance on explicit harmonization, it offers a practical pathway for deploying predictive models in varied clinical environments.

Links:

Five Principles of Secure Software Development for 2025

Securing CI/CD Pipelines: Protecting Against Emerging Threats

Veracode's 2025 GenAI Code Security Report: AI Code Vulnerabilities

Integrating Cybersecurity in Software Development: Best Practices for UK Companies

Apiiro AI SAST: Transforming Application Security Testing

Vibe Coding: Revolutionizing Software Development with AI

Enhancing Software Security with DevSecOps Integration

Vibe Coding: Revolutionizing Software Development with Security Challenges

AI in Hiring: Risks of Bias and Inequality

Mastering Java Testing with JUnit: A Developer's Guide