hh.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
No free lunch but a cheaper supper: A general framework for streaming anomaly detection
Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), CAISR - Center for Applied Intelligent Systems Research.ORCID iD: 0000-0002-6249-4144
Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), CAISR - Center for Applied Intelligent Systems Research.ORCID iD: 0000-0002-7796-5201
Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), CAISR - Center for Applied Intelligent Systems Research.ORCID iD: 0000-0002-3495-2961
Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), CAISR - Center for Applied Intelligent Systems Research.
2020 (English)In: Expert systems with applications, ISSN 0957-4174, E-ISSN 1873-6793, Vol. 155, article id 113453Article in journal (Refereed) Published
Abstract [en]

In recent years, there has been increased research interest in detecting anomalies in temporal streaming data. A variety of algorithms have been developed in the data mining community, which can be divided into two categories (i.e., general and ad hoc). In most cases, general approaches assume the one-size-fits-all solution model where a single anomaly detector can detect all anomalies in any domain.  To date, there exists no single general method that has been shown to outperform the others across different anomaly types, use cases and datasets. On the other hand, ad hoc approaches that are designed for a specific application lack flexibility. Adapting an existing algorithm is not straightforward if the specific constraints or requirements for the existing task change. In this paper, we propose SAFARI, a general framework formulated by abstracting and unifying the fundamental tasks in streaming anomaly detection, which provides a flexible and extensible anomaly detection procedure. SAFARI helps to facilitate more elaborate algorithm comparisons by allowing us to isolate the effects of shared and unique characteristics of different algorithms on detection performance. Using SAFARI, we have implemented various anomaly detectors and identified a research gap that motivates us to propose a novel learning strategy in this work. We conducted an extensive evaluation study of 20 detectors that are composed using SAFARI and compared their performances using real-world benchmark datasets with different properties. The results indicate that there is no single superior detector that works well for every case, proving our hypothesis that "there is no free lunch" in the streaming anomaly detection world. Finally, we discuss the benefits and drawbacks of each method in-depth and draw a set of conclusions to guide future users of SAFARI.

Place, publisher, year, edition, pages
Oxford: Elsevier, 2020. Vol. 155, article id 113453
Keywords [en]
Anomaly detection, Stream mining, Reservoir sampling, Online learning
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:hh:diva-41420DOI: 10.1016/j.eswa.2020.113453ISI: 000542127900005Scopus ID: 2-s2.0-85084107998OAI: oai:DiVA.org:hh-41420DiVA, id: diva2:1389296
Funder
Knowledge Foundation, 20160103Available from: 2020-01-29 Created: 2020-01-29 Last updated: 2022-02-22Bibliographically approved
In thesis
1. Self-Monitoring using Joint Human-Machine Learning: Algorithms and Applications
Open this publication in new window or tab >>Self-Monitoring using Joint Human-Machine Learning: Algorithms and Applications
2020 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

The ability to diagnose deviations and predict faults effectively is an important task in various industrial domains for minimizing costs and productivity loss and also conserving environmental resources. However, the majority of the efforts for diagnostics are still carried out by human experts in a time-consuming and expensive manner. Automated data-driven solutions are needed for continuous monitoring of complex systems over time. On the other hand, domain expertise plays a significant role in developing, evaluating, and improving diagnostics and monitoring functions. Therefore, automatically derived solutions must be able to interact with domain experts by taking advantage of available a priori knowledge and by incorporating their feedback into the learning process.

This thesis and appended papers tackle the problem of generating a real-world self-monitoring system for continuous monitoring of machines and operations by developing algorithms that can learn data streams and their relations over time and detect anomalies using joint-human machine learning. Throughout this thesis, we have described a number of different approaches, each designed for the needs of a self-monitoring system, and have composed these methods into a coherent framework. More specifically, we presented a two-layer meta-framework, in which the first layer was concerned with learning appropriate data representations and detectinganomalies in an unsupervised fashion, and the second layer aimed at interactively exploiting available expert knowledge in a joint human-machine learning fashion.

Furthermore, district heating has been the focus of this thesis as the application domain with the goal of automatically detecting faults and anomalies by comparing heat demands among different groups of customers. We applied and enriched different methods on this domain, which then contributed to the development and improvement of the meta-framework. The contributions that result from the studies included in this work can be summarized into four categories: (1) exploring different data representations that are suitable for the self-monitoring task based on data characteristics and domain knowledge, (2) discovering patterns and groups in data that describe normal behavior of the monitored system/systems, (3) implementing methods to successfully discriminate anomalies from the normal behavior, and (4) incorporating domain knowledge and expert feedback into self-monitoring.

Place, publisher, year, edition, pages
Halmstad: Halmstad University Press, 2020. p. 45
Series
Halmstad University Dissertations ; 69
Keywords
self-monitoring, anomaly detection, machine learning
National Category
Computer Sciences
Identifiers
urn:nbn:se:hh:diva-41421 (URN)978-91-88749-47-5 (ISBN)978-91-88749-46-8 (ISBN)
Presentation
2020-02-25, J102 Wigforssalen, Kristian IV:s väg 3, Halmstad, 13:00 (English)
Opponent
Supervisors
Funder
Knowledge Foundation, 20160103
Available from: 2020-01-31 Created: 2020-01-29 Last updated: 2020-01-31Bibliographically approved
2. Together We Learn More: Algorithms and Applications for User-Centric Anomaly Detection
Open this publication in new window or tab >>Together We Learn More: Algorithms and Applications for User-Centric Anomaly Detection
2022 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Anomaly detection is the problem of identifying data points or patterns that do not conform to normal behavior. Anomalies in data often correspond to important and actionable information such as frauds in financial applications, faults in production units, intrusions in computer systems, and serious diseases in patient records. One of the fundamental challenges of anomaly detection is that the exact notion of anomaly is subjective and varies greatly in different applications and domains. This makes distinguishing anomalies that match with the end-user's expectations from other observations difficult. As a result, anomaly detectors produce many false alarms that do not correspond to semantically meaningful anomalies for the analyst. 

Humans can help, in different ways, to bridge this gap between detected anomalies and ''anomalies-of-interest'': by giving clues on features more likely to reveal interesting anomalies or providing feedback to separate them from irrelevant ones. However, it is not realistic to assume a human to easily provide feedback without explaining why the algorithm classifies a certain sample as an anomaly. Interpretability of results is crucial for an analyst to be able to investigate the candidate anomaly and decide whether it is actually interesting or not. 

In this thesis, we take a step forward to improve the practical use of anomaly detection in real-life by leveraging human-algorithm collaboration. This thesis and appended papers study the problem of formulating and implementing algorithms for user-centric anomaly detection-- a setting in which people analyze, interpret, and learn from the detector's results, as well as provide domain knowledge or feedback. Throughout this thesis, we have described a number of diverse approaches, each addressing different challenges and needs of user-centric anomaly detection in the real world, and combined these methods into a coherent framework. By conducting different studies, this thesis finds that a comprehensive approach incorporating human knowledge and providing interpretable results can lead to more effective and practical anomaly detection and more successful real-world applications. The major contributions that result from the studies included in this work and led the above conclusion can be summarized into five categories: (1) exploring different data representations that are suitable for anomaly detection based on data characteristics and domain knowledge, (2) discovering patterns and groups in data that describe normal behavior in the current application, (3) implementing a generic and extensible framework enabling use-case-specific detectors suitable for different scenarios, (4) incorporating domain knowledge and expert feedback into anomaly detection, and (5) producing interpretable detection results that support end-users in understanding and validating the anomalies. 

Place, publisher, year, edition, pages
Halmstad University Press, 2022. p. 211
Series
Halmstad University Dissertations ; 9
Keywords
data mining, machine learning, anomaly detection
National Category
Computer Sciences
Research subject
Smart Cities and Communities
Identifiers
urn:nbn:se:hh:diva-46404 (URN)978-91-88749-87-1 (ISBN)978-91-88749-88-8 (ISBN)
Public defence
2022-03-22, Visionen (Halda), Kristian IV:s väg 3, Halmstad, 13:00 (English)
Opponent
Supervisors
Available from: 2022-02-25 Created: 2022-02-22 Last updated: 2022-02-25Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopusFull text on arXiv

Authority records

Calikus, EceNowaczyk, SławomirPinheiro Sant'Anna, AnitaDikmen, Onur

Search in DiVA

By author/editor
Calikus, EceNowaczyk, SławomirPinheiro Sant'Anna, AnitaDikmen, Onur
By organisation
CAISR - Center for Applied Intelligent Systems Research
In the same journal
Expert systems with applications
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 331 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf