hh.sePublications
Change search
Link to record
Permanent link

Direct link
BETA
Bouguelia, Mohamed-RafikORCID iD iconorcid.org/0000-0002-2859-6155
Publications (5 of 5) Show all publications
Bouguelia, M.-R., Nowaczyk, S., Santosh, K. C. & Verikas, A. (2018). Agreeing to disagree: active learning with noisy labels without crowdsourcing. International Journal of Machine Learning and Cybernetics, 9(8), 1307-1319
Open this publication in new window or tab >>Agreeing to disagree: active learning with noisy labels without crowdsourcing
2018 (English)In: International Journal of Machine Learning and Cybernetics, ISSN 1868-8071, E-ISSN 1868-808X, Vol. 9, no 8, p. 1307-1319Article in journal (Refereed) Published
Abstract [en]

We propose a new active learning method for classification, which handles label noise without relying on multiple oracles (i.e., crowdsourcing). We propose a strategy that selects (for labeling) instances with a high influence on the learned model. An instance x is said to have a high influence on the model h, if training h on x (with label y = h(x)) would result in a model that greatly disagrees with h on labeling other instances. Then, we propose another strategy that selects (for labeling) instances that are highly influenced by changes in the learned model. An instance x is said to be highly influenced, if training h with a set of instances would result in a committee of models that agree on a common label for x but disagree with h(x). We compare the two strategies and we show, on different publicly available datasets, that selecting instances according to the first strategy while eliminating noisy labels according to the second strategy, greatly improves the accuracy compared to several benchmarking methods, even when a significant amount of instances are mislabeled. © Springer-Verlag Berlin Heidelberg 2017

Place, publisher, year, edition, pages
Heidelberg: Springer, 2018
Keywords
Active learning, Classification, Label noise, Mislabeling, Interactive learning, Machine learning, Data mining
National Category
Signal Processing Computer Systems Computer Sciences
Identifiers
urn:nbn:se:hh:diva-33365 (URN)10.1007/s13042-017-0645-0 (DOI)
Available from: 2017-02-27 Created: 2017-02-27 Last updated: 2018-07-23Bibliographically approved
Bouguelia, M.-R., Nowaczyk, S. & Payberah, A. H. (2018). An adaptive algorithm for anomaly and novelty detection in evolving data streams. Data mining and knowledge discovery, 32(6), 1597-1633
Open this publication in new window or tab >>An adaptive algorithm for anomaly and novelty detection in evolving data streams
2018 (English)In: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 32, no 6, p. 1597-1633Article in journal (Refereed) Published
Abstract [en]

In the era of big data, considerable research focus is being put on designing efficient algorithms capable of learning and extracting high-level knowledge from ubiquitous data streams in an online fashion. While, most existing algorithms assume that data samples are drawn from a stationary distribution, several complex environments deal with data streams that are subject to change over time. Taking this aspect into consideration is an important step towards building truly aware and intelligent systems. In this paper, we propose GNG-A, an adaptive method for incremental unsupervised learning from evolving data streams experiencing various types of change. The proposed method maintains a continuously updated network (graph) of neurons by extending the Growing Neural Gas algorithm with three complementary mechanisms, allowing it to closely track both gradual and sudden changes in the data distribution. First, an adaptation mechanism handles local changes where the distribution is only non-stationary in some regions of the feature space. Second, an adaptive forgetting mechanism identifies and removes neurons that become irrelevant due to the evolving nature of the stream. Finally, a probabilistic evolution mechanism creates new neurons when there is a need to represent data in new regions of the feature space. The proposed method is demonstrated for anomaly and novelty detection in non-stationary environments. Results show that the method handles different data distributions and efficiently reacts to various types of change. © 2018 The Author(s)

Place, publisher, year, edition, pages
New York: Springer, 2018
Keywords
Data stream, Growing neural gas, Change detection, Non-stationary environments, Anomaly and novelty detection
National Category
Signal Processing
Identifiers
urn:nbn:se:hh:diva-36752 (URN)10.1007/s10618-018-0571-0 (DOI)2-s2.0-85046792304 (Scopus ID)
Projects
BIDAF
Available from: 2018-05-13 Created: 2018-05-13 Last updated: 2018-09-20Bibliographically approved
Bouguelia, M.-R., Karlsson, A., Pashami, S., Nowaczyk, S. & Holst, A. (2018). Mode tracking using multiple data streams. Information Fusion, 43, 33-46
Open this publication in new window or tab >>Mode tracking using multiple data streams
Show others...
2018 (English)In: Information Fusion, ISSN 1566-2535, E-ISSN 1872-6305, Vol. 43, p. 33-46Article in journal (Refereed) Published
Abstract [en]

Most existing work in information fusion focuses on combining information with well-defined meaning towards a concrete, pre-specified goal. In contradistinction, we instead aim for autonomous discovery of high-level knowledge from ubiquitous data streams. This paper introduces a method for recognition and tracking of hidden conceptual modes, which are essential to fully understand the operation of complex environments. We consider a scenario of analyzing usage of a fleet of city buses, where the objective is to automatically discover and track modes such as highway route, heavy traffic, or aggressive driver, based on available on-board signals. The method we propose is based on aggregating the data over time, since the high-level modes are only apparent in the longer perspective. We search through different features and subsets of the data, and identify those that lead to good clusterings, interpreting those clusters as initial, rough models of the prospective modes. We utilize Bayesian tracking in order to continuously improve the parameters of those models, based on the new data, while at the same time following how the modes evolve over time. Experiments with artificial data of varying degrees of complexity, as well as on real-world datasets, prove the effectiveness of the proposed method in accurately discovering the modes and in identifying which one best explains the current observations from multiple data streams. © 2017 Elsevier B.V. All rights reserved.

Place, publisher, year, edition, pages
Amsterdam: Elsevier, 2018
Keywords
Mode tracking, Clustering, Data streams, Time series, Knowledge discovery
National Category
Computer Sciences
Identifiers
urn:nbn:se:hh:diva-35729 (URN)10.1016/j.inffus.2017.11.011 (DOI)2-s2.0-85037072003 (Scopus ID)
Projects
BIDAF
Available from: 2017-12-01 Created: 2017-12-01 Last updated: 2019-04-12Bibliographically approved
Bouguelia, M.-R., Pashami, S. & Nowaczyk, S. (2017). Multi-Task Representation Learning. In: Niklas Lavesson (Ed.), 30th Annual Workshop ofthe Swedish Artificial Intelligence Society SAIS 2017: May 15–16, 2017, Karlskrona, Sweden. Paper presented at 30th Annual Workshop of the Swedish Artificial Intelligence Society SAIS 2017, May 15–16, 2017, Karlskrona, Sweden (pp. 53-59). Linköping: Linköping University Electronic Press
Open this publication in new window or tab >>Multi-Task Representation Learning
2017 (English)In: 30th Annual Workshop ofthe Swedish Artificial Intelligence Society SAIS 2017: May 15–16, 2017, Karlskrona, Sweden / [ed] Niklas Lavesson, Linköping: Linköping University Electronic Press, 2017, p. 53-59Conference paper, Published paper (Refereed)
Abstract [en]

The majority of existing machine learning algorithms assume that training examples are already represented with sufficiently good features, in practice ones that are designed manually. This traditional way of preprocessing the data is not only tedious and time consuming, but also not sufficient to capture all the different aspects of the available information. With big data phenomenon, this issue is only going to grow, as the data is rarely collected and analyzed with a specific purpose in mind, and more often re-used for solving different problems. Moreover, the expert knowledge about the problem which allows them to come up with good representations does not necessarily generalize to other tasks. Therefore, much focus has been put on designing methods that can automatically learn features or representations of the data instead of learning from handcrafted features. However, a lot of this work used ad hoc methods and the theoretical understanding in this area is lacking.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2017
Series
Linköping Electronic Conference Proceedings, ISSN 1650-3686, E-ISSN 1650-3740 ; 137
Keywords
Representation Learning, Multi-Task Learning, Machine Learning, Supervised Learning, Feature Learning
National Category
Signal Processing
Identifiers
urn:nbn:se:hh:diva-36755 (URN)978-91-7685-496-9 (ISBN)
Conference
30th Annual Workshop of the Swedish Artificial Intelligence Society SAIS 2017, May 15–16, 2017, Karlskrona, Sweden
Available from: 2018-05-14 Created: 2018-05-14 Last updated: 2019-04-12Bibliographically approved
Bouguelia, M.-R., Gonzalez, R., Iagnemma, K. & Byttner, S. (2017). Unsupervised classification of slip events for planetary exploration rovers. Journal of terramechanics, 73, 95-106
Open this publication in new window or tab >>Unsupervised classification of slip events for planetary exploration rovers
2017 (English)In: Journal of terramechanics, ISSN 0022-4898, E-ISSN 1879-1204, Vol. 73, p. 95-106Article in journal (Refereed) Published
Abstract [en]

This paper introduces an unsupervised method for the classification of discrete rovers' slip events based on proprioceptive signals. In particular, the method is able to automatically discover and track various degrees of slip (i.e. low slip, moderate slip, high slip). The proposed method is based on aggregating the data over time, since high level concepts, such as high and low slip, are concepts that are dependent on longer time perspectives. Different features and subsets of the data have been identified leading to a proper clustering, interpreting those clusters as initial models of the prospective concepts. Bayesian tracking has been used in order to continuously improve the parameters of these models, based on the new data. Two real datasets are used to validate the proposed approach in comparison to other known unsupervised and supervised machine learning methods. The first dataset is collected by a single-wheel testbed available at MIT. The second dataset was collected by means of a planetary exploration rover in real off-road conditions. Experiments prove that the proposed method is more accurate (up to 86% of accuracy vs. 80% for K-means) in discovering various levels of slip while being fully unsupervised (no need for hand-labeled data for training). © 2017 ISTVS

Place, publisher, year, edition, pages
Doetinchem: Elsevier, 2017
Keywords
Unsupervised learning, Clustering, Data-driven modeling, Slip, MSL rover, LATUV rover
National Category
Computer Sciences
Identifiers
urn:nbn:se:hh:diva-35169 (URN)10.1016/j.jterra.2017.09.001 (DOI)2-s2.0-85029811187 (Scopus ID)
Available from: 2017-10-09 Created: 2017-10-09 Last updated: 2018-01-13Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-2859-6155

Search in DiVA

Show all publications