hh.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Analysis of Statistical Data Heterogeneity in Federated Fault Identification
Halmstad University, School of Information Technology, Center for Applied Intelligent Systems Research (CAISR).ORCID iD: 0000-0002-1759-8593
Halmstad University, School of Information Technology, Center for Applied Intelligent Systems Research (CAISR).ORCID iD: 0000-0002-7796-5201
Halmstad University, School of Information Technology, Center for Applied Intelligent Systems Research (CAISR).ORCID iD: 0000-0003-3272-4145
2023 (English)In: Proceedings of the Asia Pacific Conference of the PHM Society 2023 / [ed] Takehisa Yairi; Samir Khan; Seiji Tsutsumi, New York: The Prognostics and Health Management Society , 2023, Vol. 4Conference paper, Published paper (Refereed)
Abstract [en]

Federated Learning (FL) is a setting where different clients collaboratively train a Machine Learning model in a privacy-preserving manner, i.e., without the requirement to share data. Given the importance of security and privacy in real-world applications, FL is gaining popularity in many areas, including predictive maintenance. For example, it allows independent companies to construct a model collaboratively. However, since different companies operate in different environments, their working conditions may differ, resulting in heterogeneity among their data distributions. This paper considers the fault identification problem and simulates different scenarios of data heterogeneity. Such a setting remains challenging for popular FL algorithms, and thus we demonstrate the considerations to be taken into account when designing federated predictive maintenance solutions.  

Place, publisher, year, edition, pages
New York: The Prognostics and Health Management Society , 2023. Vol. 4
Series
Proceedings of the Asia Pacific Conference of the PHM Society, E-ISSN 2994-7219
Keywords [en]
Predictive Maintenance, Federated Learning, Predictive Maintenance Federated Learning Statistical Heterogeneity
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:hh:diva-52478DOI: 10.36001/phmap.2023.v4i1.3708OAI: oai:DiVA.org:hh-52478DiVA, id: diva2:1831308
Conference
4th Asia Pacific Conference of the Prognostics and Health Management, Tokyo, Japan, September 11-14, 2023
Funder
VinnovaAvailable from: 2024-01-25 Created: 2024-01-25 Last updated: 2024-01-31Bibliographically approved
In thesis
1. From Domain Adaptation to Federated Learning
Open this publication in new window or tab >>From Domain Adaptation to Federated Learning
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Data-driven methods have been gaining increasing attention; however, along with the benefits they offer, they also present several challenges, particularly concerning data availability, accessibility, and heterogeneity, the three factors that have shaped the development of this thesis. Data availability is the primary consideration when employing data-driven methodologies. Suppose we consider a system for which we aim to develop a Machine Learning (ML) model. Gathering labeled samples, particularly in the context of real-world problem-solving, consistently poses challenges. While collecting raw data may be feasible in certain situations, the process of labeling them is often difficult, leading to a shortage of labeled data. However, historical (outdated) data or labeled data may occasionally be available from different yet related systems. A feasible approach would be to leverage data from different but related sources to assist in situations in which data is scarce. The challenge with this approach is that data collected from various sources may exhibit statistical differences even if they have the same features, i.e., data heterogeneity. Data heterogeneity impacts the performance of ML models. This issue arises because conventional machine learning algorithms assume what’s known as the IID (Independently and Identically Distributed) assumption; training and test data come from the same underlying distribution and are independent and identically sampled. The IID assumption may not hold when data comes from different sources and can result in a trained model performing less effectively when used in another system or context. In such situations, Domain Adaptation (DA) is a solution. DA enhances the performance of ML models by minimizing the distribution distance between samples originating from diverse resources. Several factors come into play within the DA context, each necessitating distinct DA methods. In this thesis, we conduct an investigation and propose DA methods while considering various factors, including the number of domains involved, the quantity of data available (both labeled and unlabeled) within these domains, the task at hand (classification or regression), and the nature of statistical heterogeneity among samples from different domains, such as covariate shift or concept shift. It is crucial to emphasize that DA techniques work by assuming that we access the data from different resources. Data may be owned by different data owners, and data owners are willing to share their data. This data accessibility enables us to adapt data and optimize models accordingly. However, privacy concerns become a significant issue when addressing real-world problems, for example, where the data owners are from industry sectors. These privacy considerations necessitate the development of privacy-preserving techniques, such as Federated Learning (FL). FL is a privacy-preserving machine learning technique that enables different data owners to collaborate without sharing raw data samples. Instead, they share their ML models or model updates. Through this collaborative process, a global machine learning model is constructed, which can generalize and perform well across all participating domains. This approach addresses privacy concerns by keeping individual data localized while benefiting from collective knowledge to improve the global model. Among the most widely accepted FL methods is Federated Averaging (FedAvg). In this method, all clients connect with a central server. The server then computes the global model by aggregating the local models from each client, typically by calculating their average. Similar to DA, FL encounters issues when data from different domains exhibit statistical differences, i.e., heterogeneity, that can negatively affect the performance of the global model. A specialized branch known as Heterogeneous FL has emerged to tackle this situation. This thesis, alongside DA, considers the heterogeneous FL problem. This thesis examines FL scenarios where all clients possess labeled data. We begin by conducting experimental investigations to illustrate the impact of various types of heterogeneity on the outcomes of FL. Afterward, we perform a theoretical analysis and establish an upper bound for the risk of the global model for each client. Accordingly, we see that minimizing heterogeneity between the clients minimizes this upper bound. Building upon this insight, we develop a method aimed at minimizing this heterogeneity to personalize the global model for the clients, thereby enhancing the performance of the federated system. This thesis focuses on two practical applications that highlight the relevant challenges: Predictive Maintenance and Network Security. In predictive maintenance, the focus is on fault identification using both DA and FL. Additionally, the thesis investigates predicting the state of health of electric bus batteries using DA. Regarding network security applications, the thesis addresses network traffic classification and intrusion detection, employing DA. ©Zahra Taghiyarrenani.

Place, publisher, year, edition, pages
Halmstad: Halmstad University Press, 2024. p. 37
Series
Halmstad University Dissertations ; 107
National Category
Computer Sciences
Identifiers
urn:nbn:se:hh:diva-52510 (URN)978-91-89587-28-1 (ISBN)978-91-89587-27-4 (ISBN)
Public defence
2024-02-22, Wigforss, Kristian IV:s väg 3, Halmstad, 10:00 (English)
Opponent
Supervisors
Available from: 2024-02-01 Created: 2024-01-31 Last updated: 2024-02-01Bibliographically approved

Open Access in DiVA

fulltext(825 kB)26 downloads
File information
File name FULLTEXT01.pdfFile size 825 kBChecksum SHA-512
91bb027afad52e328dae4afec9100bde8c0c8c77387718a869448c4973d8745bd49ea5580f5108ef257008bcf3b94cc837145175c182a8b2ca239c8d86016fc6
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Authority records

Taghiyarrenani, ZahraNowaczyk, SławomirPashami, Sepideh

Search in DiVA

By author/editor
Taghiyarrenani, ZahraNowaczyk, SławomirPashami, Sepideh
By organisation
Center for Applied Intelligent Systems Research (CAISR)
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 26 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 180 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf