Electronic health records (EHRs) contain longitudinal traces of patients’ interactions with the healthcare system. These patient trajectories—sequences of diagnoses, medications, and other events over time—offer opportunities to predict adverse outcomes early to intervene. In practice, however, EHR data are heterogeneous, temporally complex, and often available only in limited-sized cohorts with scarce labels. This thesis, Learning More from Less: Accurate and Trustworthy Foundation Models for Patient Trajectories, investigates how to build foundation-style models for such data.
The work is guided by the question: How can we improve prediction and provide trustworthy explanations for adverse health outcomes by modeling longitudinal EHR trajectories? It follows two tracks: (i) robust EHR-specific representation learning, and (ii) trustworthy modeling.
First, the thesis enriches self-supervised pretraining for structured EHR. A trajectory-order objective (TOO-BERT) teaches models to distinguish true temporal order from plausible permutations, while a source-masked objective model cross-sources dependencies. These objectives exploit the structure already present in trajectories, yielding stronger representations and improved prediction of incident outcomes.
Second, the thesis targets robust adaptation under label scarcity. Adaptive Noise-Augmented Attention (ANAA) perturbs and smoothly augments attention scores during fine-tuning, broadening overly sharp attention patterns and improving performance.
Third, the thesis develops explanation methods tailored to multimodal transformers EHR telemetry models. A manifold-aware baseline for Integrated Gradients keeps attribution paths in high-density regions of the representation space, improving faithfulness. Group-Sparse IG further adjusts the path schedule to produce sparse, token-level explanations that are more concise. Building on these methods, the thesis also proposes an approach to aggregate individual-level attributions into population-level insights for greater actionability, and applies it to identify key drivers of longevity and early mortality in the Malmö Diet and Cancer cohort
Finally, the thesis explores uncertainty estimation in small, sequence-based datasets through a Gaussian process model with a decoupled global alignment kernel for peptide permeability prediction. This demonstrates how structured sequence kernels can provide better accuracy and calibrated uncertainty when data are limited.
Overall, the thesis shows that in complex, data-scarce EHR settings, ``learning more from less'' requires making the pretraining, fine-tuning, and explanation stages explicitly reflect the structure of patient trajectories, leading to more accurate and trustworthy models for clinical risk prediction.