hh.sePublications
121 of 2
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Towards Trustworthy Survival Analysis with Machine Learning Models
Halmstad University, School of Information Technology.ORCID iD: 0000-0001-9416-5647
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Survival Analysis is a major sub-field of statistics that studies the time to an event, like a patient's death or a machine's failure. This makes survival analysis crucial in critical applications like medical studies and predictive maintenance. In such applications, safety is critical creating a demand for trustworthy models. Machine learning and deep learning techniques started to be used, spurred by the growing volume of collected data. While this direction holds promise for improving certain qualities, such as model performance, it also introduces new challenges in other areas, particularly model explainability. This challenge is general in machine learning due to the black-box nature of most machine learning models, especially deep neural networks (DNN). However, survival models usually output functions rather than point estimates like regression and classification models which makes their explainability even more challenging task. 

Other challenges also exist due to the nature of time-to-event data, such as censoring. This phenomenon happens due to several reasons, most commonly due to the limited study time, resulting in a considerable number of studied subjects not experiencing the event during the study. Moreover, in industrial settings, recorded events do not always correspond to actual failures. This is because companies tend to replace machine parts before their failure due to safety or cost considerations resulting in noisy event labels. Censoring and noisy labels create a challenge in building and evaluating survival models.    

This thesis addresses these challenges by following two tracks, one focusing on explainability and the other on improving performance. The two tracks eventually merge providing an explainable survival model while maintaining the performance of its black-box counterpart.

In the explainability track, we propose two post-hoc explanation methods based on what we define as Survival Patterns. These are patterns in the predictions of the survival model that represent distinct survival behaviors in the studied population. We propose an algorithm for discovering the survival patterns upon which the two post-hoc explanation methods rely. The first method, SurvSHAP, utilizes a proxy classification model that learns the relationship between the input space and the discovered survival patterns. The proxy model is then explained using the SHAP method resulting in per-pattern explanations. The second post-hoc method relies on finding counterfactual explanations that would change the decision of the survival model from one source survival pattern to another. The algorithm uses Particle Swarm Optimization (PSO) with a tailored objective function to guarantee certain explanation qualities in plausibility and actionability.

On the performance track, we propose a Variational Encoder-Decoder model for estimating the survival function using a sampling-based approach. The model is trained using a regression-based objective function that accounts for censored instances assisted with a differentiable lower bound of the concordance index (C-index). In the same work, we propose a decomposition of the C-index where we found out that it can be expressed as a weighted harmonic average of two quantities; one quantifies the concordance among the observed event cases and the other quantifies the concordance between observed events and censored cases. The two quantities are weighted by a factor that balances the contribution of event and censored cases to the total C-index. Such decomposition uncovers hidden differences among survival models that seem equivalent based on the C-index. We also used genetic programming to search for a regression-based loss function for survival analysis with an improved concordance ability. The search results uncovered an interesting phenomenon, upon which we propose the use of the continuously differentiable Softplus function instead of the sharp-cut Relu function for handling censored cases. Lastly in the performance track, we propose an algorithm for correcting erroneous observed event labels that can be caused by preventive maintenance activities. The algorithm adopts an iterative expectation-maximization-like approach utilizing a genetic algorithm to search for better event labels that can maximize a surrogate survival model's performance.

Finally, the two tracks merge and we propose CoxSE a Cox-based deep neural network model that provides inherent explanations while maintaining the performance of its black-box counterpart. The model relies on the Self-Explaining Neural Networks (SENN) and the Cox Proportional Hazard formulation. We also propose CoxSENAM, an enhancement to the Neural Additive Model (NAM) by adopting the NAM structure along with the SENN loss function and type of output. The CoxSENAM model demonstrated better explanations than the NAM-based model with enhanced robustness to noise.

Place, publisher, year, edition, pages
Halmstad: Halmstad University Press, 2025. , p. 29
Series
Halmstad University Dissertations ; 128
National Category
Computer Sciences Information Systems
Identifiers
URN: urn:nbn:se:hh:diva-55202ISBN: 978-91-89587-72-4 (electronic)ISBN: 978-91-89587-73-1 (print)OAI: oai:DiVA.org:hh-55202DiVA, id: diva2:1925520
Public defence
2025-01-31, S3030, Högskolan i Halmstad, Kristian IV:s väg 3, Halmstad, 09:00 (English)
Opponent
Supervisors
Available from: 2025-01-10 Created: 2025-01-08 Last updated: 2025-01-10Bibliographically approved
List of papers
1. SurvSHAP: A Proxy-Based Algorithm for Explaining Survival Models with SHAP
Open this publication in new window or tab >>SurvSHAP: A Proxy-Based Algorithm for Explaining Survival Models with SHAP
2022 (English)In: 2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA) / [ed] Joshua Zhexue Huang; Yi Pan; Barbara Hammer; Muhammad Khurram Khan; Xing Xie; Laizhong Cui; Yulin He, Piscataway, NJ: IEEE, 2022Conference paper, Published paper (Refereed)
Abstract [en]

Survival Analysis models usually output functions (survival or hazard functions) rather than point predictions like regression and classification models. This makes the explanations of such models a challenging task, especially using the Shapley values. We propose SurvSHAP, a new model-agnostic algorithm to explain survival models that predict survival curves. The algorithm is based on discovering patterns in the predicted survival curves, the output of the survival model, that would identify significantly different survival behaviors, and utilizing a proxy model and SHAP method to explain these distinct survival behaviors. Experiments on synthetic and real datasets demonstrate that the SurvSHAP is able to capture the underlying factors of the survival patterns. Moreover, SurvSHAP results on the Cox Proportional Hazard model are compared with the weights of the model to show that we provide faithful overall explanations, with more fine-grained explanations of the sub-populations. We also illustrate the wrong model and explanations learned by a Cox model when applied to heterogeneous sub-populations. We show that a non-linear machine learning survival model with SurvSHAP can better model the data and provide better explanations than linear models.

Place, publisher, year, edition, pages
Piscataway, NJ: IEEE, 2022
Keywords
SurvSHAP, Explainable AI, Survival Patterns, SHAP, Shapley values, Proxy Model, Survival Analysis, Machine Learning
National Category
Computer Sciences
Identifiers
urn:nbn:se:hh:diva-49149 (URN)10.1109/DSAA54385.2022.10032392 (DOI)000967751000099 ()2-s2.0-85148538187 (Scopus ID)978-1-6654-7330-9 (ISBN)978-1-6654-7331-6 (ISBN)
Conference
The 9th IEEE International Conference on Data Science and Advanced Analytics (DSAA 2022), Shenzhen, China, October 13-16, 2022
Funder
Knowledge Foundation
Note

Funding: This research was funded by the CHIST-ERA grant CHIST-ERA-19-XAI-012 and CAISR+ project funded by the Swedish Knowledge Foundation.

Som manuscript i avhandling/As manuscript in thesis.

Available from: 2023-02-10 Created: 2023-02-10 Last updated: 2025-01-09Bibliographically approved
2. Understanding Survival Models through Counterfactual Explanations
Open this publication in new window or tab >>Understanding Survival Models through Counterfactual Explanations
Show others...
2024 (English)In: Computational Science – ICCS 2024: 24th International Conference, Malaga, Spain, July 2–4, 2024, Proceedings, Part IV / [ed] Elisa Bertino; Wen Gao; Bernhard Steffen; Moti Yung, Cham: Springer Nature, 2024, p. 310-324Conference paper, Published paper (Other academic)
Abstract [en]

The development of black-box survival models has created a need for methods that explain their outputs, just as in the case of traditional machine learning methods. Survival models usually predict functions rather than point estimates. This special nature of their output makes it more difficult to explain their operation. We propose a method to generate plausible counterfactual explanations for survival models. The method supports two options that handle the special nature of survival models' output. One option relies on the Survival Scores, which are based on the area under the survival function, which is more suitable for proportional hazard models. The other one relies on Survival Patterns in the predictions of the survival model, which represent groups that are significantly different from the survival perspective. This guarantees an intuitive well-defined change from one risk group (Survival Pattern) to another and can handle more realistic cases where the proportional hazard assumption does not hold. The method uses a Particle Swarm Optimization algorithm to optimize a loss function to achieve four objectives: the desired change in the target, proximity to the explained example, likelihood, and the actionability of the counterfactual example. Two predictive maintenance datasets and one medical dataset are used to illustrate the results in different settings. The results show that our method produces plausible counterfactuals, which increase the understanding of black-box survival models. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.

Place, publisher, year, edition, pages
Cham: Springer Nature, 2024
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 14835
Keywords
Survival Analysis, Explainable Artificial Intelligence, Survival Patterns, Counterfactual Explanations
National Category
Computer Sciences
Identifiers
urn:nbn:se:hh:diva-52260 (URN)001279326500028 ()2-s2.0-85199557114& (Scopus ID)978-3-031-63771-1 (ISBN)
Conference
24th International Conference, Malaga, Spain, July 2–4, 2024
Funder
Knowledge Foundation, 20200001
Note

Som manuscript i avhandling/As manuscript in thesis

Available from: 2023-12-18 Created: 2023-12-18 Last updated: 2025-01-09Bibliographically approved
3. The Concordance Index Decomposition: A Measure for a Deeper Understanding of Survival Prediction Models
Open this publication in new window or tab >>The Concordance Index Decomposition: A Measure for a Deeper Understanding of Survival Prediction Models
2024 (English)In: Artificial Intelligence in Medicine, ISSN 0933-3657, E-ISSN 1873-2860, Vol. 148, p. 1-10, article id 102781Article in journal (Refereed) Published
Abstract [en]

The Concordance Index (C-index) is a commonly used metric in Survival Analysis for evaluating the performance of a prediction model. This paper proposes a decomposition of the C-index into a weighted harmonic mean of two quantities: one for ranking observed events versus other observed events, and the other for ranking observed events versus censored cases. This decomposition enables a more fine-grained analysis of the strengths and weaknesses of survival prediction methods. The usefulness of this decomposition is demonstrated through benchmark comparisons against state-of-the-art and classical models, together with a new variational generative neural-network-based method (SurVED), which is also proposed in this paper. Performance is assessed using four publicly available datasets with varying levels of censoring. The analysis using the C-index decomposition and synthetic censoring shows that deep learning models utilize the observed events more effectively than other models, allowing them to keep a stable C-index in different censoring levels. In contrast, classical machine learning models deteriorate when the censoring level decreases due to their inability to improve on ranking the events versus other events. © 2024 The Author(s)

Place, publisher, year, edition, pages
Amsterdam: Elsevier, 2024
Keywords
Survival Analysis, Evaluation Metric, Concordance Index, Variational Encoder-Decoder
National Category
Computer Sciences
Identifiers
urn:nbn:se:hh:diva-52259 (URN)10.1016/j.artmed.2024.102781 (DOI)001171816900001 ()38325926 (PubMedID)2-s2.0-85184733529& (Scopus ID)
Funder
Knowledge Foundation, 20200001
Note

Som manuscript i avhandling/As manuscript in thesis

Available from: 2023-12-18 Created: 2023-12-18 Last updated: 2025-01-09Bibliographically approved
4. Improving Concordance Index in Regression-based Survival Analysis: Discovery of Loss Function for Neural Networks
Open this publication in new window or tab >>Improving Concordance Index in Regression-based Survival Analysis: Discovery of Loss Function for Neural Networks
Show others...
2024 (English)In: GECCO '24 Companion: Proceedings of the Genetic and Evolutionary Computation Conference Companion, New York: Association for Computing Machinery (ACM), 2024, p. 1863-1869Conference paper, Published paper (Other academic)
Abstract [en]

In this work, we use an Evolutionary Algorithm (EA) to discover a novel Neural Network (NN) regression-based survival loss function with the aim of improving the C-index performance. Our contribution is threefold; firstly, we propose an evolutionary meta-learning algorithm SAGA$_{loss}$ for optimizing a neural-network regression-based loss function that maximizes the C-index; our algorithm consistently discovers specialized loss functions that outperform MSCE. Secondly, based on our analysis of the evolutionary search results, we highlight a non-intuitive insight that signifies the importance of the non-zero gradient for the censored cases part of the loss function, a property that is shown to be useful in improving concordance. Finally, based on this insight, we propose MSCE$_{Sp}$, a novel survival regression loss function that can be used off-the-shelf and generally performs better than the Mean Squared Error for censored cases. We performed extensive experiments on 19 benchmark datasets to validate our findings. © 2024 is held by the owner/author(s).

Place, publisher, year, edition, pages
New York: Association for Computing Machinery (ACM), 2024
Keywords
evolutionary meta-learning, loss function, neural networks, survival analysis, regression
National Category
Computer Systems
Identifiers
urn:nbn:se:hh:diva-52468 (URN)10.1145/3638530.3664129 (DOI)2-s2.0-85200800944& (Scopus ID)979-8-4007-0495-6 (ISBN)
Conference
The Genetic and Evolutionary Computation Conference, Melbourne, Australia, July 14-18, 2024
Note

Som manuscript i avhandling/As manuscript in thesis

Available from: 2024-01-24 Created: 2024-01-24 Last updated: 2025-01-09Bibliographically approved
5. Discovering Premature Replacements in Predictive Maintenance Time-to-Event Data
Open this publication in new window or tab >>Discovering Premature Replacements in Predictive Maintenance Time-to-Event Data
Show others...
2023 (English)In: Proceedings of the Asia Pacific Conference of the PHM Society 2023 / [ed] Takehisa Yairi; Samir Khan; Seiji Tsutsumi, New York: The Prognostics and Health Management Society , 2023, Vol. 4Conference paper, Published paper (Refereed)
Abstract [en]

Time-To-Event (TTE) modeling using survival analysis in industrial settings faces the challenge of premature replacements of machine components, which leads to bias and errors in survival prediction. Typically, TTE survival data contains information about components and if they had failed or not up to a certain time. For failed components, the time is noted, and a failure is referred to as an event. A component that has not failed is denoted as censored. In industrial settings, in contrast to medical settings, there can be considerable uncertainty in an event; a component can be replaced before it fails to prevent operation stops or because maintenance staff believe that the component is faulty. This shows up as “no fault found” in warranty studies, where a significant proportion of replaced components may appear fault-free when tested or inspected after replacement.

In this work, we propose an expectation-maximization-like method for discovering such premature replacements in survival data. The method is a two-phase iterative algorithm employing a genetic algorithm in the maximization phase to learn better event assignments on a validation set. The learned labels through iterations are accumulated and averaged to be used to initialize the following expectation phase. The assumption is that the more often the event is selected, the more likely it is to be an actual failure and not a “no fault found”.

Experiments on synthesized and simulated data show that the proposed method can correctly detect a significant percentage of premature replacement cases.

Place, publisher, year, edition, pages
New York: The Prognostics and Health Management Society, 2023
Series
Proceedings of the Asia Pacific Conference of the PHM Society, E-ISSN 2994-7219
Keywords
Survival Analysis, Predictive Maintenance, Early Replacements, Genetic Algorithms
National Category
Computer Sciences
Identifiers
urn:nbn:se:hh:diva-52105 (URN)10.36001/phmap.2023.v4i1.3609 (DOI)
Conference
4th Asia Pacific Conference of the Prognostics and Health Management, Tokyo, Japan, September 11-14, 2023
Funder
Knowledge Foundation, 20200001
Note

Som manuscript i avhandling/As manuscript in thesis.

Available from: 2023-11-23 Created: 2023-11-23 Last updated: 2025-01-09Bibliographically approved
6. CoxSE: Exploring the Potential of Self-Explaining Neural Networks with Cox Proportional Hazards Model for Survival Analysis
Open this publication in new window or tab >>CoxSE: Exploring the Potential of Self-Explaining Neural Networks with Cox Proportional Hazards Model for Survival Analysis
Show others...
(English)Manuscript (preprint) (Other academic)
Abstract [en]

The Cox Proportional Hazards (CPH) model has long been the preferred survival model for its explainability. However, to increase its predictive power beyond its linear log-risk, it was extended to utilize deep neural networks, sacrificing its explainability. In this work, we explore the potential of self-explaining neural networks (SENN) for survival analysis. We propose a new locally explainable Cox proportional hazards model, named CoxSE, by estimating a locally-linear log-hazard function using the SENN. We also propose a modification to the Neural additive (NAM) models hybrid with SENN, named CoxSENAM, which enables the control of the stability and consistency of the generated explanations. 

Several experiments using synthetic and real datasets are presented, benchmarking CoxSE and CoxSENAM against a NAM-based model, a DeepSurv model explained with SHAP, and a linear CPH model. The results show that, unlike the NAM-based model, the SENN-based model can provide more stable and consistent explanations while maintaining the predictive power of the black-box model. The results also show that, due to their structural design, NAM-based models demonstrate better robustness to non-informative features. Among the models, the hybrid model exhibits the best robustness.

Keywords
Self-Explaining Neural Networks, Cox Proportional Hazards, Survival Analysis, Interpretability, XAI, Neural Additive Models
National Category
Computer Sciences
Identifiers
urn:nbn:se:hh:diva-55201 (URN)10.48550/arXiv.2407.13849 (DOI)
Note

Som manuscript i avhandling/As manuscript in thesis

Available from: 2025-01-08 Created: 2025-01-08 Last updated: 2025-01-09Bibliographically approved

Open Access in DiVA

Fulltext(1103 kB)35 downloads
File information
File name FULLTEXT02.pdfFile size 1103 kBChecksum SHA-512
6ae11b6b51c473edc0533da149cbe57cfb4de84578457f8407a7093d56a3ee1485e4e7e0e791d1cf9eecd11cff34d0bafc3aa54a678aa4bd6a51af2f5bb7e20c
Type fulltextMimetype application/pdf

Authority records

Alabdallah, Abdallah

Search in DiVA

By author/editor
Alabdallah, Abdallah
By organisation
School of Information Technology
Computer SciencesInformation Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 35 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 192 hits
121 of 2
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf