Estimating p-Values for Deviation Detection
2014 (English)In: Proceedings: 2014 IEEE Eighth International Conference on Self-Adaptive and Self-Organizing Systems SASO 2014 / [ed] Randall Bilof, Los Alamitos, CA: IEEE Computer Society, 2014, p. 100-109Conference paper, Published paper (Refereed)
Abstract [en]
Deviation detection is important for self-monitoring systems. To perform deviation detection well requires methods that, given only "normal" data from a distribution of unknown parametric form, can produce a reliable statistic for rejecting the null hypothesis, i.e. evidence for devating data. One measure of the strength of this evidence based on the data is the p-value, but few deviation detection methods utilize p-value estimation. We compare three methods that can be used to produce p-values: one class support vector machine (OCSVM), conformal anomaly detection (CAD), and a simple "most central pattern" (MCP) algorithm. The SVM and the CAD method should be able to handle a distribution of any shape. The methods are evaluated on synthetic data sets to test and illustrate their strengths and weaknesses, and on data from a real life self-monitoring scenario with a city bus fleet in normal traffic. The OCSVM has a Gaussian kernel for the synthetic data and a Hellinger kernel for the empirical data. The MCP method uses the Mahalanobis metric for the synthetic data and the Hellinger metric for the empirical data. The CAD uses the same metrics as the MCP method and has a k-nearest neighbour (kNN) non-conformity measure for both sets. The conclusion is that all three methods give reasonable, and quite similar, results on the real life data set but that they have clear strengths and weaknesses on the synthetic data sets. The MCP algorithm is quick and accurate when the "normal" data distribution is unimodal and symmetric (with the chosen metric) but not otherwise. The OCSVM is a bit cumbersome to use to create (quantized) p-values but is accurate and reliable when the data distribution is multimodal and asymmetric. The CAD is also accurate for multimodal and asymmetric distributions. The experiment on the vehicle data illustrate how algorithms like these can be used in a self-monitoring system that uses a fleet of vehicles to conduct deviation detection without supervisi- n and without prior knowledge about what is being monitored. © 2014 IEEE.
Place, publisher, year, edition, pages
Los Alamitos, CA: IEEE Computer Society, 2014. p. 100-109
Series
International Conference on Self-Adaptive and Self-Organizing Systems : [proceedings], ISSN 1949-3673
Keywords [en]
Training, Kernel, Vehicles, Conferences, Histograms, Design automation, Measurement
National Category
Computer Systems
Identifiers
URN: urn:nbn:se:hh:diva-26151DOI: 10.1109/SASO.2014.22ISI: 000361021200011Scopus ID: 2-s2.0-84936889577ISBN: 978-1-4799-5367-7 (electronic)ISBN: 978-1-4799-5368-4 (print)OAI: oai:DiVA.org:hh-26151DiVA, id: diva2:734143
Conference
SASO 2014 - Eighth IEEE International Conference on Self-Adaptive and Self-Organizing Systems, Imperial College, London, United Kingdom, September 8-12, 2014
Funder
VINNOVA
Note
Funding: Vinnova & Volvo AB
2014-07-152014-07-152021-05-11Bibliographically approved