hh.sePublications
1 of 1
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Learning More from Less: Accurate and Trustworthy Foundation Models for Patient Trajectories
Halmstad University, School of Information Technology.ORCID iD: 0000-0002-1999-8435
2026 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Electronic health records (EHRs) contain longitudinal traces of patients’ interactions with the healthcare system. These patient trajectories—sequences of diagnoses, medications, and other events over time—offer opportunities to predict adverse outcomes early to intervene. In practice, however, EHR data are heterogeneous, temporally complex, and often available only in limited-sized cohorts with scarce labels. This thesis, Learning More from Less: Accurate and Trustworthy Foundation Models for Patient Trajectories, investigates how to build foundation-style models for such data.

The work is guided by the question: How can we improve prediction and provide trustworthy explanations for adverse health outcomes by modeling longitudinal EHR trajectories? It follows two tracks: (i) robust EHR-specific representation learning, and (ii) trustworthy modeling. 

First, the thesis enriches self-supervised pretraining for structured EHR. A trajectory-order objective (TOO-BERT) teaches models to distinguish true temporal order from plausible permutations, while a source-masked objective model cross-sources dependencies. These objectives exploit the structure already present in trajectories, yielding stronger representations and improved prediction of incident outcomes.

Second, the thesis targets robust adaptation under label scarcity. Adaptive Noise-Augmented Attention (ANAA) perturbs and smoothly augments attention scores during fine-tuning, broadening overly sharp attention patterns and improving performance.

Third, the thesis develops explanation methods tailored to multimodal transformers EHR telemetry models. A manifold-aware baseline for Integrated Gradients keeps attribution paths in high-density regions of the representation space, improving faithfulness. Group-Sparse IG further adjusts the path schedule to produce sparse, token-level explanations that are more concise. Building on these methods, the thesis also proposes an approach to aggregate individual-level attributions into population-level insights for greater actionability, and applies it to identify key drivers of longevity and early mortality in the Malmö Diet and Cancer cohort

Finally, the thesis explores uncertainty estimation in small, sequence-based datasets through a Gaussian process model with a decoupled global alignment kernel for peptide permeability prediction. This demonstrates how structured sequence kernels can provide better accuracy and calibrated uncertainty when data are limited.

Overall, the thesis shows that in complex, data-scarce EHR settings, ``learning more from less'' requires making the pretraining, fine-tuning, and explanation stages explicitly reflect the structure of patient trajectories, leading to more accurate and trustworthy models for clinical risk prediction.

Place, publisher, year, edition, pages
Halmstad: Halmstad University Press, 2026. , p. 46
Series
Halmstad University Dissertations ; 141
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
URN: urn:nbn:se:hh:diva-58472ISBN: 978-91-90123-03-4 (print)ISBN: 978-91-90123-04-1 (electronic)OAI: oai:DiVA.org:hh-58472DiVA, id: diva2:2040080
Public defence
2026-03-26, S3030, Kristian IV:s väg 3, Halmstad, 13:00 (English)
Opponent
Supervisors
Funder
Swedish Research Council, 2019-00198Knowledge Foundation, 20200208 01 HAvailable from: 2026-02-26 Created: 2026-02-19 Last updated: 2026-02-26Bibliographically approved
List of papers
1. Deep learning prediction models based on EHR trajectories: A systematic review
Open this publication in new window or tab >>Deep learning prediction models based on EHR trajectories: A systematic review
2023 (English)In: Journal of Biomedical Informatics, ISSN 1532-0464, E-ISSN 1532-0480, Vol. 144, article id 104430Article, review/survey (Refereed) Published
Abstract [en]

Background: : Electronic health records (EHRs) are generated at an ever-increasing rate. EHR trajectories, the temporal aspect of health records, facilitate predicting patients’ future health-related risks. It enables healthcare systems to increase the quality of care through early identification and primary prevention. Deep learning techniques have shown great capacity for analyzing complex data and have been successful for prediction tasks using complex EHR trajectories. This systematic review aims to analyze recent studies to identify challenges, knowledge gaps, and ongoing research directions. Methods: For this systematic review, we searched Scopus, PubMed, IEEE Xplore, and ACM databases from Jan 2016 to April 2022 using search terms centered around EHR, deep learning, and trajectories. Then the selected papers were analyzed according to publication characteristics, objectives, and their solutions regarding existing challenges, such as the model's capacity to deal with intricate data dependencies, data insufficiency, and explainability. Results: : After removing duplicates and out-of-scope papers, 63 papers were selected, which showed rapid growth in the number of research in recent years. Predicting all diseases in the next visit and the onset of cardiovascular diseases were the most common targets. Different contextual and non-contextual representation learning methods are employed to retrieve important information from the sequence of EHR trajectories. Recurrent neural networks and the time-aware attention mechanism for modeling long-term dependencies, self-attentions, convolutional neural networks, graphs for representing inner visit relations, and attention scores for explainability were frequently used among the reviewed publications. Conclusions: This systematic review demonstrated how recent breakthroughs in deep learning methods have facilitated the modeling of EHR trajectories. Research on improving the ability of graph neural networks, attention mechanisms, and cross-modal learning to analyze intricate dependencies among EHRs has shown good progress. There is a need to increase the number of publicly available EHR trajectory datasets to allow for easier comparison among different models. Also, very few developed models can handle all aspects of EHR trajectory data. © 2023 The Author(s)

Place, publisher, year, edition, pages
Maryland Heights, MO: Academic Press, 2023
Keywords
Deep learning, Disease prediction, EHR trajectories, Electronic health records, Systematic review
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:hh:diva-51443 (URN)10.1016/j.jbi.2023.104430 (DOI)001031876800001 ()37380061 (PubMedID)2-s2.0-85164039312 (Scopus ID)
Funder
Swedish Research Council, 2019-00198
Note

Funding: This study was part of the AIR Lund (Artificially Intelligent use of Registers at Lund University) research environment and received funding from the Swedish Research Council (VR; grant no. 2019-00198).

This research is included in the CAISR Health research profile.

Available from: 2023-08-17 Created: 2023-08-17 Last updated: 2026-02-19Bibliographically approved
2. Trajectory-Ordered Objectives for Self-Supervised Representation Learning of Temporal Healthcare Data Using Transformers: Model Development and Evaluation Study
Open this publication in new window or tab >>Trajectory-Ordered Objectives for Self-Supervised Representation Learning of Temporal Healthcare Data Using Transformers: Model Development and Evaluation Study
Show others...
2025 (English)In: JMIR Medical Informatics, E-ISSN 2291-9694, Vol. 13, article id e68138Article in journal (Refereed) Published
Abstract [en]

Background: The growing availability of electronic health records (EHRs) presents an opportunity to enhance patient care by uncovering hidden health risks and improving informed decisions through advanced deep learning methods. However, modeling EHR sequential data, that is, patient trajectories, is challenging due to the evolving relationships between diagnoses and treatments over time. Significant progress has been achieved using transformers and self-supervised learning. While BERT-inspired models using masked language modeling (MLM) capture EHR context, they often struggle with the complex temporal dynamics of disease progression and interventions.

Objective: This study aims to improve the modeling of EHR sequences by addressing the limitations of traditional transformer-based approaches in capturing complex temporal dependencies.

Methods: We introduce Trajectory Order Objective BERT (Bidirectional Encoder Representations from Transformers; TOO-BERT), a transformer-based model that advances the MLM pretraining approach by integrating a novel TOO to better learn the complex sequential dependencies between medical events. TOO-Bert enhanced the learned context by MLM by pretraining the model to distinguish ordered sequences of medical codes from permuted ones in a patient trajectory. The TOO is enhanced by a conditional selection process that focus on medical codes or visits that frequently occur together, to further improve contextual understanding and strengthen temporal awareness. We evaluate TOO-BERT on 2 extensive EHR datasets, MIMIC-IV hospitalization records and the Malmo Diet and Cancer Cohort (MDC)-comprising approximately 10 and 8 million medical codes, respectively. TOO-BERT is compared against conventional machine learning methods, a transformer trained from scratch, and a transformer pretrained on MLM in predicting heart failure (HF), Alzheimer disease (AD), and prolonged length of stay (PLS).

Results: TOO-BERT outperformed conventional machine learning methods and transformer-based approaches in HF, AD, and PLS prediction across both datasets. In the MDC dataset, TOO-BERT improved HF and AD prediction, increasing area under the receiver operating characteristic curve (AUC) scores from 67.7 and 69.5 with the MLM-pretrained Transformer to 73.9 and 71.9, respectively. In the MIMIC-IV dataset, TOO-BERT enhanced HF and PLS prediction, raising AUC scores from 86.2 and 60.2 with the MLM-pretrained Transformer to 89.8 and 60.4, respectively. Notably, TOO-BERT demonstrated strong performance in HF prediction even with limited fine-tuning data, achieving AUC scores of 0.877 and 0.823, compared to 0.839 and 0.799 for the MLM-pretrained Transformer, when fine-tuned on only 50% (442/884) and 20% (176/884) of the training data, respectively.

Conclusions: These findings demonstrate the effectiveness of integrating temporal ordering objectives into MLM-pretrained models, enabling deeper insights into the complex temporal relationships inherent in EHR data. Attention analysis further highlights TOO-BERT's capability to capture and represent sophisticated structural patterns within patient trajectories, offering a more nuanced understanding of disease progression.

 ©Ali Amirahmadi, Farzaneh Etminani, Jonas Björk, Olle Melander, Mattias Ohlsson.

Place, publisher, year, edition, pages
Toronto: JMIR Publications, 2025
Keywords
BERT, alzheimer disease, deep learning, disease prediction, effectiveness, electronic health record, heart failure, language mode, masked language mode, patient trajectories, prolonged health of stay, representation learning, temporal, transformer
National Category
Information Systems
Research subject
Health Innovation, IDC
Identifiers
urn:nbn:se:hh:diva-56834 (URN)10.2196/68138 (DOI)001519087300002 ()40465350 (PubMedID)2-s2.0-105008277733 (Scopus ID)
Funder
Swedish Research Council, 2019-00198Knowledge Foundation, 20200208 01 H
Available from: 2025-07-08 Created: 2025-07-08 Last updated: 2026-02-19Bibliographically approved
3. A Masked Language Model for Multi-Source EHR Trajectories Contextual Representation Learning
Open this publication in new window or tab >>A Masked Language Model for Multi-Source EHR Trajectories Contextual Representation Learning
Show others...
2023 (English)In: Caring is Sharing – Exploiting the Value in Data for Health and Innovation: Proceedings of MIE 2023 / [ed] Maria Hägglund; Madeleine Blusi; Stefano Bonacina; Lina Nilsson; Inge Cort Madsen; Sylvia Pelayo; Anne Moen; Arriel Benis; Lars Lindsköld; Parisis Gallos, Amsterdam: IOS Press, 2023, Vol. 302, p. 609-610Conference paper, Published paper (Refereed)
Abstract [en]

Using electronic health records data and machine learning to guide future decisions needs to address challenges, including 1) long/short-term dependencies and 2) interactions between diseases and interventions. Bidirectional transformers have effectively addressed the first challenge. Here we tackled the latter challenge by masking one source (e.g., ICD10 codes) and training the transformer to predict it using other sources (e.g., ATC codes). © 2023 European Federation for Medical Informatics (EFMI) and IOS Press.

Place, publisher, year, edition, pages
Amsterdam: IOS Press, 2023
Series
Studies in Health Technology and Informatics, ISSN 0926-9630, E-ISSN 1879-8365 ; 302
Keywords
deep learning, disease prediction, electronic health records, Masked language model, patient trajectories, representation learning
National Category
Computer Sciences
Research subject
Health Innovation, IDC; Health Innovation, IDC
Identifiers
urn:nbn:se:hh:diva-51734 (URN)10.3233/SHTI230217 (DOI)37203760 (PubMedID)2-s2.0-85159757442 (Scopus ID)978-1-64368-389-8 (ISBN)
Conference
The 33rd Medical Informatics Europe Conference, MIE2023, Gothenburg, Sweden, 22-25 May, 2023
Available from: 2023-10-03 Created: 2023-10-03 Last updated: 2026-02-19Bibliographically approved
4. Adaptive noise-augmented attention for enhancing Transformer fine-tuning on longitudinal medical data
Open this publication in new window or tab >>Adaptive noise-augmented attention for enhancing Transformer fine-tuning on longitudinal medical data
2025 (English)In: Frontiers in Artificial Intelligence, E-ISSN 2624-8212, Vol. 8, p. 1-12, article id 1663484Article in journal (Refereed) Published
Abstract [en]

Transformer models pre-trained on self-supervised tasks and fine-tuned on downstream objectives have achieved remarkable results across a variety of domains. However, fine-tuning these models for clinical predictions from longitudinal medical data, such as electronic health records (EHR), remains challenging due to limited labeled data and the complex, event-driven nature of medical sequences. While self-attention mechanisms are powerful for capturing relationships within sequences, they may underperform when modeling subtle dependencies between sparse clinical events under limited supervision. We introduce a simple yet effective fine-tuning technique, Adaptive Noise-Augmented Attention (ANAA), which injects adaptive noise directly into the self-attention weights and applies a 2D Gaussian kernel to smooth the resulting attention maps. This mechanism broadens the attention distribution across tokens while refining it to emphasize more informative events. Unlike prior approaches that require expensive modifications to the architecture and pre-training phase, ANAA operates entirely during fine-tuning. Empirical results across multiple clinical prediction tasks demonstrate consistent performance improvements. Furthermore, we analyze how ANAA shapes the learned attention behavior, offering interpretable insights into the model's handling of temporal dependencies in EHR data. © 2025 Amirahmadi, Etminani and Ohlsson.

Place, publisher, year, edition, pages
Lausanne: Frontiers Media S.A., 2025
Keywords
adaptive noise, augmentation, electronic health records (EHR), fine-tuning, medical data, representation learning, self-attention, Transformer
National Category
Natural Language Processing
Research subject
Health Innovation, IDC
Identifiers
urn:nbn:se:hh:diva-57641 (URN)10.3389/frai.2025.1663484 (DOI)001585845000001 ()41041085 (PubMedID)2-s2.0-105018335736 (Scopus ID)
Funder
Swedish Research Council, 2019-00198Knowledge Foundation, 20200208 01 H
Note

This research is included in the CAISR Health research profile.

Available from: 2025-11-03 Created: 2025-11-03 Last updated: 2026-02-19Bibliographically approved
5. Group-Sparse Manifold-Aware Integrated Gradients for Multimodal Transformers on EHR Trajectories
Open this publication in new window or tab >>Group-Sparse Manifold-Aware Integrated Gradients for Multimodal Transformers on EHR Trajectories
2025 (English)In: Proceedings of Machine Learning Research, Cambridge, MA: JMLR , 2025, Vol. 297, p. 1-19Conference paper, Published paper (Refereed)
Abstract [en]

Integrated Gradients (IG) is a popular method for explaining clinical deep models—including widely used multimodal, pretrained Transformers—but its utility on EHR code sequences is hampered by (i) the lack of principled baselines for sequence of discrete tokens and (ii) dense, hard-to-interpret generated attributions. To address both, first, we introduce a manifold-aware baseline: the expected value under the empirical dist—implemented as the position-wise empirical mean of pre-Transformer token embeddings on held-out validation data, which keeps IG interpolants near the data manifold. Second, we introduce {GS-IG}, which preserves the straight path geometry but re-parameterizes the schedule (\alpha(t)=t^{\theta}) and selects (\theta) per input by minimizing a token-level (\ell_{2,1}) (group-sparsity) objective, producing concise, practitioner-friendly explanations. On MIMIC-IV (incident heart failure) and MDC (early mortality), the manifold-aware baseline improves faithfulness (higher Comprehensiveness, lower Sufficiency), and GS-IG reduces token-level (\ell_{2,1}) by 9–18% with negligible change in those metrics on the manifold-aware baseline. The method is lightweight and yields faithful, sparse, and actionable. © 2025 A. Amirahmadi, F. Etminani & M. Ohlsson.

Place, publisher, year, edition, pages
Cambridge, MA: JMLR, 2025
Series
Proceedings of Machine Learning Research, ISSN 2640-3498
Keywords
Integrated Gradients, Explainability, Multimodal Transformers, Group Sparsity, Manifold-aware, Electronic Health Records (EHR), Patient trajectories
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:hh:diva-58437 (URN)
Conference
Machine Learning for Health (ML4H) 2025, San Diego, USA, 1-2 december, 2025
Funder
Swedish Research Council, 019-00198Knowledge Foundation, 20200208 01 H
Available from: 2026-02-16 Created: 2026-02-16 Last updated: 2026-02-19Bibliographically approved
6. From Individual Attributions to Population Risk: Identifying Key Drivers of Longevity and Early Mortality from Longitudinal EHR Data
Open this publication in new window or tab >>From Individual Attributions to Population Risk: Identifying Key Drivers of Longevity and Early Mortality from Longitudinal EHR Data
Show others...
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Transformer models trained on longitudinal electronic health records (EHR) can achieve strong predictive performance, but their explanations are typically produced at the individual patient level. Such local attributions are useful for case review, yet they do not directly answer cohort-level questions about which factors consistently drive risk across a population. In this work, we present a framework for translating individual-level attributions into population-level insights for mortality-related prediction tasks on longitudinal EHR data. 

We introduce a robustness-oriented global explainability framework that aggregates attributions across multiple trained models and complementary aggregation schemes, including magnitude-based and direction-aware global importance measures. Features are prioritized by stability across models and aggregation strategies, yielding consensus rankings of diagnoses, medication classes, and baseline factors consistently associated with early death and long life. Experiments across multiple prediction horizons show that multimodal transformer models achieve strong and stable discrimination, while the proposed aggregation framework produces reproducible, direction-aware population-level importance profiles.

This work provides a practical bridge from black-box mortality prediction to transparent, population-level risk factor discovery, supporting hypothesis generation and interpretable use of deep learning on longitudinal EHR data.

Keywords
Longitudinal electronic health records, EHR, Mortality prediction, Longevity, Multimodal transformers, Integrated Gradients, Global aggregation of local explanations, Population-level explainability
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:hh:diva-58444 (URN)
Funder
Swedish Research Council, 019-00198Knowledge Foundation, 0200208 01 H
Available from: 2026-02-16 Created: 2026-02-16 Last updated: 2026-02-26Bibliographically approved
7. A decoupled alignment kernel for peptide membrane permeability predictions
Open this publication in new window or tab >>A decoupled alignment kernel for peptide membrane permeability predictions
Show others...
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Cyclic peptides are promising modalities for targeting intracellular sites; however, cell-membrane permeability remains a key bottleneck, exacerbated by limited public data and the need for well-calibrated uncertainty. Instead of relying on data-eager complex deep learning architecture, we propose a monomer-aware decoupled global alignment kernel (MD-GAK), which couples chemically meaningful residue-residue similarity with sequence alignment while decoupling local matches from gap penalties. MD-GAK is a relatively simple kernel. To further demonstrate the robustness of our framework, we also introduce a variant, PMD-GAK, which incorporates a triangular positional prior. As we will show in the experimental section, PMD-GAK can offer additional advantages over MD-GAK, particularly in reducing calibration errors. Since our focus is on uncertainty estimation, we use Gaussian Processes as the predictive model, as both MD-GAK and PMD-GAK can be directly applied within this framework. We demonstrate the effectiveness of our methods through an extensive set of experiments, comparing our fully reproducible approach against state-of-the-art models, and show that it outperforms them across all metrics.

Keywords
cyclic peptides, permeability, Gaussian processes, global alignment kernel, Tanimoto, calibration
National Category
Computer Sciences
Identifiers
urn:nbn:se:hh:diva-58438 (URN)10.48550/arXiv.2511.21566 (DOI)
Funder
Swedish Research Council, 2019-0019Knowledge Foundation, 0200208 01 H
Available from: 2026-02-16 Created: 2026-02-16 Last updated: 2026-02-26Bibliographically approved

Open Access in DiVA

Fulltext(1614 kB)22 downloads
File information
File name FULLTEXT02.pdfFile size 1614 kBChecksum SHA-512
f5fdb95ee8a9ceb8db9d3af4f6caa3048119e00a4801896a2d3376eac840ac408780809cc15065ad784c8ffacebc991d95f795dd007af360d30d631bf9b59c41
Type fulltextMimetype application/pdf

Authority records

Amirahmadi, Ali

Search in DiVA

By author/editor
Amirahmadi, Ali
By organisation
School of Information Technology
Electrical Engineering, Electronic Engineering, Information Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 22 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 2444 hits
1 of 1
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf