hh.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Machine Learning Survival Models: Performance and Explainability
Halmstad University, School of Information Technology.ORCID iD: 0000-0001-9416-5647
2023 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

Survival analysis is an essential statistics and machine learning field in various critical applications like medical research and predictive maintenance. In these domains understanding models' predictions is paramount. While machine learning techniques are increasingly applied to enhance the predictive performance of survival models, they simultaneously sacrifice transparency and explainability. 

Survival models, in contrast to regular machine learning models, predict functions rather than point estimates like regression and classification models. This creates a challenge regarding explaining such models using the known off-the-shelf machine learning explanation techniques, like Shapley Values, Counterfactual examples, and others.   

Censoring is also a major issue in survival analysis where the target time variable is not fully observed for all subjects. Moreover, in predictive maintenance settings, recorded events do not always map to actual failures, where some components could be replaced because it is considered faulty or about to fail in the future based on an expert's opinion. Censoring and noisy labels create problems in terms of modeling and evaluation that require to be addressed during the development and evaluation of the survival models.

Considering the challenges in survival modeling and the differences from regular machine learning models, this thesis aims to bridge this gap by facilitating the use of machine learning explanation methods to produce plausible and actionable explanations for survival models. It also aims to enhance survival modeling and evaluation revealing a better insight into the differences among the compared survival models.

In this thesis, we propose two methods for explaining survival models which rely on discovering survival patterns in the model's predictions that group the studied subjects into significantly different survival groups. Each pattern reflects a specific survival behavior common to all the subjects in their respective group. We utilize these patterns to explain the predictions of the studied model in two ways. In the first, we employ a classification proxy model that can capture the relationship between the descriptive features of subjects and the learned survival patterns. Explaining such a proxy model using Shapley Values provides insights into the feature attribution of belonging to a specific survival pattern. In the second method, we addressed the "what if?" question by generating plausible and actionable counterfactual examples that would change the predicted pattern of the studied subject. Such counterfactual examples provide insights into actionable changes required to enhance the survivability of subjects.

We also propose a variational-inference-based generative model for estimating the time-to-event distribution. The model relies on a regression-based loss function with the ability to handle censored cases. It also relies on sampling for estimating the conditional probability of event times. Moreover, we propose a decomposition of the C-index into a weighted harmonic average of two quantities, the concordance among the observed events and the concordance between observed and censored cases. These two quantities, weighted by a factor representing the balance between the two, can reveal differences between survival models previously unseen using only the total Concordance index. This can give insight into the performances of different models and their relation to the characteristics of the studied data.

Finally, as part of enhancing survival modeling, we propose an algorithm that can correct erroneous event labels in predictive maintenance time-to-event data. we adopt an expectation-maximization-like approach utilizing a genetic algorithm to find better labels that would maximize the survival model's performance. Over iteration, the algorithm builds confidence about events' assignments which improves the search in the following iterations until convergence.

We performed experiments on real and synthetic data showing that our proposed methods enhance the performance in survival modeling and can reveal the underlying factors contributing to the explainability of survival models' behavior and performance.

Place, publisher, year, edition, pages
Halmstad: Halmstad University Press, 2023. , p. 25
Series
Halmstad University Dissertations ; 108
Keywords [en]
Survival Analysis, Explainable Artificial Intelligence, Survival Patterns, Counterfactual Explanations, Evaluation Metrics, Concordance Index
National Category
Signal Processing
Identifiers
URN: urn:nbn:se:hh:diva-52269ISBN: 978-91-89587-30-4 (print)ISBN: 978-91-89587-29-8 (electronic)OAI: oai:DiVA.org:hh-52269DiVA, id: diva2:1820647
Presentation
2024-01-18, Wigforss, Hus J, Kristan IV:s väg 3, Halmstad, 09:00 (English)
Opponent
Supervisors
Available from: 2023-12-19 Created: 2023-12-18 Last updated: 2025-10-01Bibliographically approved
List of papers
1. SurvSHAP: A Proxy-Based Algorithm for Explaining Survival Models with SHAP
Open this publication in new window or tab >>SurvSHAP: A Proxy-Based Algorithm for Explaining Survival Models with SHAP
2022 (English)In: 2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA) / [ed] Joshua Zhexue Huang; Yi Pan; Barbara Hammer; Muhammad Khurram Khan; Xing Xie; Laizhong Cui; Yulin He, Piscataway, NJ: IEEE, 2022Conference paper, Published paper (Refereed)
Abstract [en]

Survival Analysis models usually output functions (survival or hazard functions) rather than point predictions like regression and classification models. This makes the explanations of such models a challenging task, especially using the Shapley values. We propose SurvSHAP, a new model-agnostic algorithm to explain survival models that predict survival curves. The algorithm is based on discovering patterns in the predicted survival curves, the output of the survival model, that would identify significantly different survival behaviors, and utilizing a proxy model and SHAP method to explain these distinct survival behaviors. Experiments on synthetic and real datasets demonstrate that the SurvSHAP is able to capture the underlying factors of the survival patterns. Moreover, SurvSHAP results on the Cox Proportional Hazard model are compared with the weights of the model to show that we provide faithful overall explanations, with more fine-grained explanations of the sub-populations. We also illustrate the wrong model and explanations learned by a Cox model when applied to heterogeneous sub-populations. We show that a non-linear machine learning survival model with SurvSHAP can better model the data and provide better explanations than linear models.

Place, publisher, year, edition, pages
Piscataway, NJ: IEEE, 2022
Keywords
SurvSHAP, Explainable AI, Survival Patterns, SHAP, Shapley values, Proxy Model, Survival Analysis, Machine Learning
National Category
Computer Sciences
Identifiers
urn:nbn:se:hh:diva-49149 (URN)10.1109/DSAA54385.2022.10032392 (DOI)000967751000099 ()2-s2.0-85148538187 (Scopus ID)978-1-6654-7330-9 (ISBN)978-1-6654-7331-6 (ISBN)
Conference
The 9th IEEE International Conference on Data Science and Advanced Analytics (DSAA 2022), Shenzhen, China, October 13-16, 2022
Funder
Knowledge Foundation
Note

Funding: This research was funded by the CHIST-ERA grant CHIST-ERA-19-XAI-012 and CAISR+ project funded by the Swedish Knowledge Foundation.

Som manuscript i avhandling/As manuscript in thesis.

Available from: 2023-02-10 Created: 2023-02-10 Last updated: 2025-10-01Bibliographically approved
2. Understanding Survival Models through Counterfactual Explanations
Open this publication in new window or tab >>Understanding Survival Models through Counterfactual Explanations
Show others...
2024 (English)In: Computational Science – ICCS 2024: 24th International Conference, Malaga, Spain, July 2–4, 2024, Proceedings, Part IV / [ed] Elisa Bertino; Wen Gao; Bernhard Steffen; Moti Yung, Cham: Springer Nature, 2024, p. 310-324Conference paper, Published paper (Other academic)
Abstract [en]

The development of black-box survival models has created a need for methods that explain their outputs, just as in the case of traditional machine learning methods. Survival models usually predict functions rather than point estimates. This special nature of their output makes it more difficult to explain their operation. We propose a method to generate plausible counterfactual explanations for survival models. The method supports two options that handle the special nature of survival models' output. One option relies on the Survival Scores, which are based on the area under the survival function, which is more suitable for proportional hazard models. The other one relies on Survival Patterns in the predictions of the survival model, which represent groups that are significantly different from the survival perspective. This guarantees an intuitive well-defined change from one risk group (Survival Pattern) to another and can handle more realistic cases where the proportional hazard assumption does not hold. The method uses a Particle Swarm Optimization algorithm to optimize a loss function to achieve four objectives: the desired change in the target, proximity to the explained example, likelihood, and the actionability of the counterfactual example. Two predictive maintenance datasets and one medical dataset are used to illustrate the results in different settings. The results show that our method produces plausible counterfactuals, which increase the understanding of black-box survival models. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.

Place, publisher, year, edition, pages
Cham: Springer Nature, 2024
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 14835
Keywords
Survival Analysis, Explainable Artificial Intelligence, Survival Patterns, Counterfactual Explanations
National Category
Computer Sciences
Identifiers
urn:nbn:se:hh:diva-52260 (URN)10.1007/978-3-031-63772-8_28 (DOI)001279326500028 ()2-s2.0-85199557114& (Scopus ID)978-3-031-63771-1 (ISBN)
Conference
24th International Conference on Computational Science, ICCS 2024, Malaga, Spain, July 2–4, 2024
Funder
Knowledge Foundation, 20200001
Note

Som manuscript i avhandling/As manuscript in thesis

Available from: 2023-12-18 Created: 2023-12-18 Last updated: 2025-10-01Bibliographically approved
3. The Concordance Index Decomposition: A Measure for a Deeper Understanding of Survival Prediction Models
Open this publication in new window or tab >>The Concordance Index Decomposition: A Measure for a Deeper Understanding of Survival Prediction Models
2024 (English)In: Artificial Intelligence in Medicine, ISSN 0933-3657, E-ISSN 1873-2860, Vol. 148, p. 1-10, article id 102781Article in journal (Refereed) Published
Abstract [en]

The Concordance Index (C-index) is a commonly used metric in Survival Analysis for evaluating the performance of a prediction model. This paper proposes a decomposition of the C-index into a weighted harmonic mean of two quantities: one for ranking observed events versus other observed events, and the other for ranking observed events versus censored cases. This decomposition enables a more fine-grained analysis of the strengths and weaknesses of survival prediction methods. The usefulness of this decomposition is demonstrated through benchmark comparisons against state-of-the-art and classical models, together with a new variational generative neural-network-based method (SurVED), which is also proposed in this paper. Performance is assessed using four publicly available datasets with varying levels of censoring. The analysis using the C-index decomposition and synthetic censoring shows that deep learning models utilize the observed events more effectively than other models, allowing them to keep a stable C-index in different censoring levels. In contrast, classical machine learning models deteriorate when the censoring level decreases due to their inability to improve on ranking the events versus other events. © 2024 The Author(s)

Place, publisher, year, edition, pages
Amsterdam: Elsevier, 2024
Keywords
Survival Analysis, Evaluation Metric, Concordance Index, Variational Encoder-Decoder
National Category
Computer Sciences
Identifiers
urn:nbn:se:hh:diva-52259 (URN)10.1016/j.artmed.2024.102781 (DOI)001171816900001 ()38325926 (PubMedID)2-s2.0-85184733529& (Scopus ID)
Funder
Knowledge Foundation, 20200001
Note

Som manuscript i avhandling/As manuscript in thesis

Available from: 2023-12-18 Created: 2023-12-18 Last updated: 2025-10-01Bibliographically approved
4. Discovering Premature Replacements in Predictive Maintenance Time-to-Event Data
Open this publication in new window or tab >>Discovering Premature Replacements in Predictive Maintenance Time-to-Event Data
Show others...
2023 (English)In: Proceedings of the Asia Pacific Conference of the PHM Society 2023 / [ed] Takehisa Yairi; Samir Khan; Seiji Tsutsumi, New York: The Prognostics and Health Management Society , 2023, Vol. 4Conference paper, Published paper (Refereed)
Abstract [en]

Time-To-Event (TTE) modeling using survival analysis in industrial settings faces the challenge of premature replacements of machine components, which leads to bias and errors in survival prediction. Typically, TTE survival data contains information about components and if they had failed or not up to a certain time. For failed components, the time is noted, and a failure is referred to as an event. A component that has not failed is denoted as censored. In industrial settings, in contrast to medical settings, there can be considerable uncertainty in an event; a component can be replaced before it fails to prevent operation stops or because maintenance staff believe that the component is faulty. This shows up as “no fault found” in warranty studies, where a significant proportion of replaced components may appear fault-free when tested or inspected after replacement.

In this work, we propose an expectation-maximization-like method for discovering such premature replacements in survival data. The method is a two-phase iterative algorithm employing a genetic algorithm in the maximization phase to learn better event assignments on a validation set. The learned labels through iterations are accumulated and averaged to be used to initialize the following expectation phase. The assumption is that the more often the event is selected, the more likely it is to be an actual failure and not a “no fault found”.

Experiments on synthesized and simulated data show that the proposed method can correctly detect a significant percentage of premature replacement cases.

Place, publisher, year, edition, pages
New York: The Prognostics and Health Management Society, 2023
Series
Proceedings of the Asia Pacific Conference of the PHM Society, E-ISSN 2994-7219
Keywords
Survival Analysis, Predictive Maintenance, Early Replacements, Genetic Algorithms
National Category
Computer Sciences
Identifiers
urn:nbn:se:hh:diva-52105 (URN)10.36001/phmap.2023.v4i1.3609 (DOI)
Conference
4th Asia Pacific Conference of the Prognostics and Health Management, Tokyo, Japan, September 11-14, 2023
Funder
Knowledge Foundation, 20200001
Note

Som manuscript i avhandling/As manuscript in thesis.

Available from: 2023-11-23 Created: 2023-11-23 Last updated: 2025-10-01Bibliographically approved

Open Access in DiVA

fulltext(1294 kB)498 downloads
File information
File name FULLTEXT03.pdfFile size 1294 kBChecksum SHA-512
cd7b2db356c3486656b3aa4b0f133576d62e5b886c16484129b8837182e827df94f849b198c6cf2ed64e14fae5a9a886ceddfc9b2586e6f1386edfc74826df55
Type fulltextMimetype application/pdf

Authority records

Alabdallah, Abdallah

Search in DiVA

By author/editor
Alabdallah, Abdallah
By organisation
School of Information Technology
Signal Processing

Search outside of DiVA

GoogleGoogle Scholar
Total: 559 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1423 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf