hh.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Together We Learn More: Algorithms and Applications for User-Centric Anomaly Detection
Halmstad University, School of Information Technology.ORCID iD: 0000-0002-6249-4144
2022 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Anomaly detection is the problem of identifying data points or patterns that do not conform to normal behavior. Anomalies in data often correspond to important and actionable information such as frauds in financial applications, faults in production units, intrusions in computer systems, and serious diseases in patient records. One of the fundamental challenges of anomaly detection is that the exact notion of anomaly is subjective and varies greatly in different applications and domains. This makes distinguishing anomalies that match with the end-user's expectations from other observations difficult. As a result, anomaly detectors produce many false alarms that do not correspond to semantically meaningful anomalies for the analyst. 

Humans can help, in different ways, to bridge this gap between detected anomalies and ''anomalies-of-interest'': by giving clues on features more likely to reveal interesting anomalies or providing feedback to separate them from irrelevant ones. However, it is not realistic to assume a human to easily provide feedback without explaining why the algorithm classifies a certain sample as an anomaly. Interpretability of results is crucial for an analyst to be able to investigate the candidate anomaly and decide whether it is actually interesting or not. 

In this thesis, we take a step forward to improve the practical use of anomaly detection in real-life by leveraging human-algorithm collaboration. This thesis and appended papers study the problem of formulating and implementing algorithms for user-centric anomaly detection-- a setting in which people analyze, interpret, and learn from the detector's results, as well as provide domain knowledge or feedback. Throughout this thesis, we have described a number of diverse approaches, each addressing different challenges and needs of user-centric anomaly detection in the real world, and combined these methods into a coherent framework. By conducting different studies, this thesis finds that a comprehensive approach incorporating human knowledge and providing interpretable results can lead to more effective and practical anomaly detection and more successful real-world applications. The major contributions that result from the studies included in this work and led the above conclusion can be summarized into five categories: (1) exploring different data representations that are suitable for anomaly detection based on data characteristics and domain knowledge, (2) discovering patterns and groups in data that describe normal behavior in the current application, (3) implementing a generic and extensible framework enabling use-case-specific detectors suitable for different scenarios, (4) incorporating domain knowledge and expert feedback into anomaly detection, and (5) producing interpretable detection results that support end-users in understanding and validating the anomalies. 

Place, publisher, year, edition, pages
Halmstad University Press, 2022. , p. 211
Series
Halmstad University Dissertations ; 9
Keywords [en]
data mining, machine learning, anomaly detection
National Category
Computer Sciences
Research subject
Smart Cities and Communities
Identifiers
URN: urn:nbn:se:hh:diva-46404ISBN: 978-91-88749-87-1 (print)ISBN: 978-91-88749-88-8 (electronic)OAI: oai:DiVA.org:hh-46404DiVA, id: diva2:1639875
Public defence
2022-03-22, Visionen (Halda), Kristian IV:s väg 3, Halmstad, 13:00 (English)
Opponent
Supervisors
Available from: 2022-02-25 Created: 2022-02-22 Last updated: 2022-02-25Bibliographically approved
List of papers
1. Ranking Abnormal Substations by Power Signature Dispersion
Open this publication in new window or tab >>Ranking Abnormal Substations by Power Signature Dispersion
2018 (English)In: Energy Procedia, ISSN 1876-6102, Vol. 149, p. 345-353Article in journal (Refereed) Published
Abstract [en]

The relation between heat demand and outdoor temperature (heat power signature) is a typical feature used to diagnose abnormal heat demand. Prior work is mainly based on setting thresholds, either statistically or manually, in order to identify outliers in the power signature. However, setting the correct threshold is a difficult task since heat demand is unique for each building. Too loose thresholds may allow outliers to go unspotted, while too tight thresholds can cause too many false alarms.

Moreover, just the number of outliers does not reflect the dispersion level in the power signature. However, high dispersion is often caused by fault or configuration problems and should be considered while modeling abnormal heat demand.

In this work, we present a novel method for ranking substations by measuring both dispersion and outliers in the power signature. We use robust regression to estimate a linear regression model. Observations that fall outside of the threshold in this model are considered outliers. Dispersion is measured using coefficient of determination R2 which is a statistical measure of how close the data are to the fitted regression line.

Our method first produces two different lists by ranking substations using number of outliers and dispersion separately. Then, we merge the two lists into one using the Borda Count method. Substations appearing on the top of the list should indicate higher abnormality in heat demand compared to the ones on the bottom. We have applied our model on data from substations connected to two district heating networks in the south of Sweden. Three different approaches i.e. outlier-based, dispersion-based and aggregated methods are compared against the rankings based on return temperatures. The results show that our method significantly outperforms the state-of-the-art outlier-based method. © 2018 The Authors. Published by Elsevier Ltd.

Place, publisher, year, edition, pages
Amsterdam: Elsevier, 2018
Keywords
abnormal heat demand, district heating, anomaly detection, fault detection, power signature
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:hh:diva-38253 (URN)10.1016/j.egypro.2018.08.198 (DOI)000482873900036 ()2-s2.0-85054100441 (Scopus ID)
Conference
16th International Symposium on District Heating and Cooling, DHC2018, Hamburg, Germany, 9-12 September, 2018
Funder
Knowledge Foundation, 20160103
Available from: 2018-11-04 Created: 2018-11-04 Last updated: 2023-08-28Bibliographically approved
2. A data-driven approach for discovering heat load patterns in district heating
Open this publication in new window or tab >>A data-driven approach for discovering heat load patterns in district heating
Show others...
2019 (English)In: Applied Energy, ISSN 0306-2619, E-ISSN 1872-9118, Vol. 252, article id 113409Article in journal (Refereed) Published
Abstract [en]

Understanding the heat usage of customers is crucial for effective district heating operations and management. Unfortunately, existing knowledge about customers and their heat load behaviors is quite scarce. Most previous studies are limited to small-scale analyses that are not representative enough to understand the behavior of the overall network. In this work, we propose a data-driven approach that enables large-scale automatic analysis of heat load patterns in district heating networks without requiring prior knowledge. Our method clusters the customer profiles into different groups, extracts their representative patterns, and detects unusual customers whose profiles deviate significantly from the rest of their group. Using our approach, we present the first large-scale, comprehensive analysis of the heat load patterns by conducting a case study on many buildings in six different customer categories connected to two district heating networks in the south of Sweden. The 1222 buildings had a total floor space of 3.4 million square meters and used 1540 TJ heat during 2016. The results show that the proposed method has a high potential to be deployed and used in practice to analyze and understand customers’ heat-use habits. © 2019 Calikus et al. Published by Elsevier Ltd.

Place, publisher, year, edition, pages
Oxford: Elsevier, 2019
Keywords
District heating, Energy efficiency, Heat load patterns, Clustering, Abnormal heat use
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:hh:diva-40907 (URN)10.1016/j.apenergy.2019.113409 (DOI)000497968000013 ()2-s2.0-85066961984 (Scopus ID)
Funder
Knowledge Foundation, 20160103
Available from: 2019-11-12 Created: 2019-11-12 Last updated: 2022-02-22Bibliographically approved
3. No free lunch but a cheaper supper: A general framework for streaming anomaly detection
Open this publication in new window or tab >>No free lunch but a cheaper supper: A general framework for streaming anomaly detection
2020 (English)In: Expert systems with applications, ISSN 0957-4174, E-ISSN 1873-6793, Vol. 155, article id 113453Article in journal (Refereed) Published
Abstract [en]

In recent years, there has been increased research interest in detecting anomalies in temporal streaming data. A variety of algorithms have been developed in the data mining community, which can be divided into two categories (i.e., general and ad hoc). In most cases, general approaches assume the one-size-fits-all solution model where a single anomaly detector can detect all anomalies in any domain.  To date, there exists no single general method that has been shown to outperform the others across different anomaly types, use cases and datasets. On the other hand, ad hoc approaches that are designed for a specific application lack flexibility. Adapting an existing algorithm is not straightforward if the specific constraints or requirements for the existing task change. In this paper, we propose SAFARI, a general framework formulated by abstracting and unifying the fundamental tasks in streaming anomaly detection, which provides a flexible and extensible anomaly detection procedure. SAFARI helps to facilitate more elaborate algorithm comparisons by allowing us to isolate the effects of shared and unique characteristics of different algorithms on detection performance. Using SAFARI, we have implemented various anomaly detectors and identified a research gap that motivates us to propose a novel learning strategy in this work. We conducted an extensive evaluation study of 20 detectors that are composed using SAFARI and compared their performances using real-world benchmark datasets with different properties. The results indicate that there is no single superior detector that works well for every case, proving our hypothesis that "there is no free lunch" in the streaming anomaly detection world. Finally, we discuss the benefits and drawbacks of each method in-depth and draw a set of conclusions to guide future users of SAFARI.

Place, publisher, year, edition, pages
Oxford: Elsevier, 2020
Keywords
Anomaly detection, Stream mining, Reservoir sampling, Online learning
National Category
Computer Sciences
Identifiers
urn:nbn:se:hh:diva-41420 (URN)10.1016/j.eswa.2020.113453 (DOI)000542127900005 ()2-s2.0-85084107998 (Scopus ID)
Funder
Knowledge Foundation, 20160103
Available from: 2020-01-29 Created: 2020-01-29 Last updated: 2022-02-22Bibliographically approved
4. Wisdom of the contexts: active ensemble learning for contextual anomaly detection
Open this publication in new window or tab >>Wisdom of the contexts: active ensemble learning for contextual anomaly detection
2022 (English)In: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 36, p. 2410-2458Article in journal (Refereed) Published
Abstract [en]

In contextual anomaly detection, an object is only considered anomalous within a specific context. Most existing methods use a single context based on a set of user-specified contextual features. However, identifying the right context can be very challenging in practice, especially in datasets with a large number of attributes. Furthermore, in real-world systems, there might be multiple anomalies that occur in different contexts and, therefore, require a combination of several "useful" contexts to unveil them. In this work, we propose a novel approach, called WisCon (Wisdom of the Contexts), to effectively detect complex contextual anomalies in situations where the true contextual and behavioral attributes are unknown. Our method constructs an ensemble of multiple contexts, with varying importance scores, based on the assumption that not all useful contexts are equally so. We estimate the importance of each context using an active learning approach with a novel query strategy. Experiments show that WisCon significantly outperforms existing baselines in different categories (i.e., active classifiers, unsupervised contextual, and non-contextual anomaly detectors) on 18 datasets. Furthermore, the results support our initial hypothesis that there is no single perfect context that successfully uncovers all kinds of contextual anomalies, and leveraging the "wisdom" of multiple contexts is necessary. © 2022, The Author(s).

Place, publisher, year, edition, pages
New York: Springer-Verlag New York, 2022
Keywords
Anomaly detection, Active learning, Contextual anomaly detection, Ensemble learning, Active learning
National Category
Computer Sciences
Research subject
Smart Cities and Communities
Identifiers
urn:nbn:se:hh:diva-46401 (URN)10.1007/s10618-022-00868-7 (DOI)000864233400001 ()2-s2.0-85139454448 (Scopus ID)
Funder
Knowledge Foundation, 20160103
Note

Som manuskript i avhandling / As manuscript in thesis

Available from: 2022-02-22 Created: 2022-02-22 Last updated: 2023-01-12Bibliographically approved
5. Context Discovery for Anomaly Detection
Open this publication in new window or tab >>Context Discovery for Anomaly Detection
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Contextual anomaly detection aims at identifying objects that are anomalous only within specific contexts. Most existing methods are limited to a single context defined by user-specified features. While identifying the right context is not trivial in practice, there is often more than just one context in real-world systems under which different anomalies naturally occur. In this work, we introduce ConQuest, a new unsupervised contextual anomaly detection approach that automatically discovers and incorporates multiple contexts useful for revealing contextual anomalies. In ConQuest, we search for relevant contexts by optimizing an unsupervised multi-objective function, where each objective is derived from desired properties of contextual anomaly detection. To effectively balance such (often competing) properties, we use a multi-objective genetic algorithm that returns a Pareto front comprising diverse, non-dominated solutions. Through experiments on various datasets, we show ConQuest outperforms state-of-the-art methods. Further, we showcase the advantage of using multiple objectives over single-objective context discovery strategies and demonstrate the interpretability aspect of ConQuest.

Keywords
anomaly detection, contextual anomaly detection
National Category
Computer Sciences
Research subject
Smart Cities and Communities
Identifiers
urn:nbn:se:hh:diva-46402 (URN)
Funder
Knowledge Foundation, 20160103
Note

Som manuskript i avhandling / As manuscript in thesis

Available from: 2022-02-22 Created: 2022-02-22 Last updated: 2023-02-27Bibliographically approved

Open Access in DiVA

fulltext(4004 kB)1170 downloads
File information
File name FULLTEXT02.pdfFile size 4004 kBChecksum SHA-512
bbf02958d283cb915231e2f4bb6e6321758b4950a777529b2d2976942c6c6d44a42f3bf3f8bdd9180b11af511383e1b55841b0afe08fb9d4ea001369ea6487ca
Type fulltextMimetype application/pdf

Authority records

Calikus, Ece

Search in DiVA

By author/editor
Calikus, Ece
By organisation
School of Information Technology
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 1172 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1916 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf