hh.sePublications
Change search
Link to record
Permanent link

Direct link
BETA
Bouguelia, Mohamed-RafikORCID iD iconorcid.org/0000-0002-2859-6155
Publications (10 of 12) Show all publications
Soliman, A., Girdzijauskas, S., Bouguelia, M.-R., Pashami, S. & Nowaczyk, S. (2020). Decentralized and Adaptive K-Means Clustering for Non-IID Data using HyperLogLog Counters. In: : . Paper presented at The 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD2020), Singapore, May 11-14, 2020. Singapore
Open this publication in new window or tab >>Decentralized and Adaptive K-Means Clustering for Non-IID Data using HyperLogLog Counters
Show others...
2020 (English)Conference paper, Published paper (Refereed)
Abstract [en]

The data shared over the Internet tends to originate from ubiquitous and autonomous sources such as mobile phones, fitness trackers, and IoT devices. Centralized and federated machine learning solutions represent the predominant way of providing smart services for users. However, moving data to central location for analysis causes not onlymany privacy concerns, but also communication overhead. Therefore, incertain situations machine learning models need to be trained in a collaborative and decentralized manner, similar to the way the data is originally generated without requiring any central authority for data or modelaggregation. This paper presents a decentralized and adaptive k-means algorithm that clusters data from multiple sources organized in peer-to-peer networks. Our algorithm allows peers to reach an approximation of the global model without sharing any raw data. Most importantly, we address the challenge of decentralized clustering with skewed non-IID data and asynchronous computations by integrating HyperLogLog counters with k-means algorithm. Furthermore, our clustering algorithm allows nodes to individually determine the number of clusters that fits their local data. Results using synthetic and real-world datasets show that our algorithm outperforms state-of-the-art decentralized k-means algorithms achieving accuracy gain that is up-to 36%.

Place, publisher, year, edition, pages
Singapore: , 2020
Keywords
Decentralized Clustering, K-Means, HyperLogLog Counters, Distributed Machine Learning, Decentralized Machine Learning, Non-IID Data
National Category
Computer Systems
Identifiers
urn:nbn:se:hh:diva-42014 (URN)
Conference
The 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD2020), Singapore, May 11-14, 2020
Funder
Knowledge Foundation
Available from: 2020-05-06 Created: 2020-05-06 Last updated: 2020-05-07
Ali Hamad, R., Salguero Hidalgo, A., Bouguelia, M.-R., Estevez, M. E. & Quero, J. M. (2020). Efficient Activity Recognition in Smart Homes Using Delayed Fuzzy Temporal Windows on Binary Sensors. IEEE journal of biomedical and health informatics, 24(2), 387-395
Open this publication in new window or tab >>Efficient Activity Recognition in Smart Homes Using Delayed Fuzzy Temporal Windows on Binary Sensors
Show others...
2020 (English)In: IEEE journal of biomedical and health informatics, ISSN 2168-2194, E-ISSN 2168-2208, Vol. 24, no 2, p. 387-395Article in journal (Refereed) Published
Abstract [en]

Human activity recognition has become an active research field over the past few years due to its wide application in various fields such as health-care, smart home monitoring, and surveillance. Existing approaches for activity recognition in smart homes have achieved promising results. Most of these approaches evaluate real-time recognition of activities using only sensor activations that precede the evaluation time (where the decision is made). However, in several critical situations, such as diagnosing people with dementia, “preceding sensor activations” are not always sufficient to accurately recognize the inhabitant's daily activities in each evaluated time. To improve performance, we propose a method that delays the recognition process in order to include some sensor activations that occur after the point in time where the decision needs to be made. For this, the proposed method uses multiple incremental fuzzy temporal windows to extract features from both preceding and some oncoming sensor activations. The proposed method is evaluated with two temporal deep learning models (convolutional neural network and long short-term memory), on a binary sensor dataset of real daily living activities. The experimental evaluation shows that the proposed method achieves significantly better results than the real-time approach, and that the representation with fuzzy temporal windows enhances performance within deep learning models. © Copyright 2020 IEEE

Place, publisher, year, edition, pages
Piscataway: Institute of Electrical and Electronics Engineers (IEEE), 2020
Keywords
Activity recognition, fuzzy temporal windows, deep learning, temporal evaluation
National Category
Computer Systems
Identifiers
urn:nbn:se:hh:diva-41633 (URN)10.1109/JBHI.2019.2918412 (DOI)2-s2.0-85079094027 (Scopus ID)
Funder
EU, Horizon 2020
Note

Other funding: Marie Sklodowska-Curie EU Framework for Research

Available from: 2020-02-10 Created: 2020-02-10 Last updated: 2020-03-04Bibliographically approved
Bae, J., Helldin, T., Riveiro, M., Nowaczyk, S., Bouguelia, M.-R. & Falkman, G. (2020). Interactive Clustering: A Comprehensive Review. ACM Computing Surveys, 53(1), Article ID 1.
Open this publication in new window or tab >>Interactive Clustering: A Comprehensive Review
Show others...
2020 (English)In: ACM Computing Surveys, ISSN 0360-0300, E-ISSN 1557-7341, Vol. 53, no 1, article id 1Article in journal (Refereed) Published
Abstract [en]

In this survey, 105 papers related to interactive clustering were reviewed according to seven perspectives: (1) on what level is the interaction happening, (2) which interactive operations are involved, (3) how user feedback is incorporated, (4) how interactive clustering is evaluated, (5) which data and (6) which clustering methods have been used, and (7) what outlined challenges there are. This article serves as a comprehensive overview of the field and outlines the state of the art within the area as well as identifies challenges and future research needs. © 2020 Copyright held by the owner/author(s).

Place, publisher, year, edition, pages
New York, NY: ACM Digital Library, 2020
Keywords
Clustering, Interactive, Interaction, User, Evaluation, Feedback, Survey, Machine Learning, Data Mining
National Category
Computer Systems
Identifiers
urn:nbn:se:hh:diva-41634 (URN)10.1145/3340960 (DOI)
Funder
Knowledge Foundation, BIDAF 20140221Swedish Research Council, EXPLAIN VR 2018-03622
Available from: 2020-02-10 Created: 2020-02-10 Last updated: 2020-02-18Bibliographically approved
Farouq, S., Byttner, S., Bouguelia, M.-R., Nord, N. & Gadd, H. (2020). Large-scale monitoring of operationally diverse district heating substations: A reference-group based approach. Engineering applications of artificial intelligence, 90, Article ID 103492.
Open this publication in new window or tab >>Large-scale monitoring of operationally diverse district heating substations: A reference-group based approach
Show others...
2020 (English)In: Engineering applications of artificial intelligence, ISSN 0952-1976, E-ISSN 1873-6769, Vol. 90, article id 103492Article in journal (Refereed) Published
Abstract [en]

A typical district heating (DH) network consists of hundreds, sometimes thousands, of substations. In the absence of a well-understood prior model or data labels about each substation, the overall monitoring of such large number of substations can be challenging. To overcome the challenge, an approach based on the collective operational monitoring of each substation by a local group (i.e., the reference-group) of other similar substations in the network was formulated. Herein, if a substation of interest (i.e., the target) starts to behave differently in comparison to those in its reference-group, then it was designated as an outlier. The approach was demonstrated on the monitoring of the return temperature variable for atypical and faulty operational behavior in 778 substations associated with multi-dwelling buildings. The choice of an appropriate similarity measure along with its size k were the two important factors that enables a reference-group to effectively detect an outlier target. Thus, different similarity measures and size k for the construction of the reference-groups were investigated, which led to the selection of the Euclidean distance with = 80. This setup resulted in the detection of 77 target substations that were outliers, i.e., the behavior of their return temperature changed in comparison to the majority of those in their respective reference-groups. Of these, 44 were detected due to the local construction of the reference-groups. In addition, six frequent patterns of deviating behavior in the return temperature of the substations were identified using the reference-group based approach, which were then further corroborated by the feedback from a DH domain expert. © 2020 Elsevier Ltd

Place, publisher, year, edition, pages
Oxford: Elsevier, 2020
Keywords
District heating substations, Return temperature, Reference-group based operational monitoring, Fault detection, Outlier detection
National Category
Other Engineering and Technologies not elsewhere specified
Identifiers
urn:nbn:se:hh:diva-40962 (URN)10.1016/j.engappai.2020.103492 (DOI)2-s2.0-85078822459 (Scopus ID)
Funder
Knowledge Foundation, 20160103
Available from: 2019-11-16 Created: 2019-11-16 Last updated: 2020-03-24Bibliographically approved
Holst, A., Bouguelia, M.-R., Görnerup, O., Pashami, S., Al-Shishtawy, A., Falkman, G., . . . Soliman, A. (2019). Eliciting Structure in Data. In: Christoph Trattner, Denis Parra & Nathalie Riche (Ed.), Joint Proceedings of the ACM IUI 2019 Workshops, Los Angeles, USA, March 20, 2019: . Paper presented at ACM IUI 2019 Workshops, Los Angeles, USA, March 20, 2019. Aachen: Rheinisch-Westfaelische Technische Hochschule Aachen
Open this publication in new window or tab >>Eliciting Structure in Data
Show others...
2019 (English)In: Joint Proceedings of the ACM IUI 2019 Workshops, Los Angeles, USA, March 20, 2019 / [ed] Christoph Trattner, Denis Parra & Nathalie Riche, Aachen: Rheinisch-Westfaelische Technische Hochschule Aachen , 2019Conference paper, Published paper (Refereed)
Abstract [en]

This paper demonstrates how to explore and visualize different types of structure in data, including clusters, anomalies, causal relations, and higher order relations. The methods are developed with the goal of being as automatic as possible and applicable to massive, streaming, and distributed data. Finally, a decentralized learning scheme is discussed, enabling finding structure in the data without collecting the data centrally.

Place, publisher, year, edition, pages
Aachen: Rheinisch-Westfaelische Technische Hochschule Aachen, 2019
Series
CEUR Workshop Proceedings, E-ISSN 1613-0073 ; 2327
Keywords
Information Visualization, Clustering, Anomaly Detection, Causal Inference, Higher-Order Structure, Distributed Analytics
National Category
Computer Systems
Identifiers
urn:nbn:se:hh:diva-41837 (URN)
Conference
ACM IUI 2019 Workshops, Los Angeles, USA, March 20, 2019
Projects
BIDAF
Funder
Knowledge Foundation
Available from: 2020-03-30 Created: 2020-03-30 Last updated: 2020-04-01Bibliographically approved
Holst, A., Karlsson, A., Bae, J. & Bouguelia, M.-R. (2019). Interactive clustering for exploring multiple data streams at different time scales and granularity. In: Proceedings of the Workshop on Interactive Data Mining, WIDM 2019: . Paper presented at 1st Workshop on Interactive Data Mining, WIDM 2019, co-located with 12th ACM International Conference on Web Search and Data Mining, WSDM 2019, 15 February 2019. Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Interactive clustering for exploring multiple data streams at different time scales and granularity
2019 (English)In: Proceedings of the Workshop on Interactive Data Mining, WIDM 2019, Association for Computing Machinery (ACM), 2019Conference paper, Published paper (Refereed)
Abstract [en]

We approach the problem of identifying and interpreting clusters over different time scales and granularity in multivariate time series data. We extract statistical features over a sliding window of each time series, and then use a Gaussian mixture model to identify clusters which are then projected back on the data streams. The human analyst can then further analyze this projection and adjust the size of the sliding window and the number of clusters in order to capture the different types of clusters over different time scales. We demonstrate the effectiveness of our approach in two different application scenarios: (1) fleet management and (2) district heating, wherein each scenario, several different types of meaningful clusters can be identified when varying over these dimensions. © 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2019
Keywords
Clustering, Interaction, Time scales, Time series, Fleet operations, Gaussian distribution, Time measurement, Application scenario, Different time scale, Gaussian Mixture Model, Multiple data streams, Multivariate time series, Time-scales, Data mining
National Category
Other Computer and Information Science Computer Systems
Identifiers
urn:nbn:se:hh:diva-41537 (URN)10.1145/3304079.3310286 (DOI)2-s2.0-85069762696 (Scopus ID)9781450362962 (ISBN)
Conference
1st Workshop on Interactive Data Mining, WIDM 2019, co-located with 12th ACM International Conference on Web Search and Data Mining, WSDM 2019, 15 February 2019
Available from: 2020-02-04 Created: 2020-02-04 Last updated: 2020-02-04Bibliographically approved
Bouguelia, M.-R., Nowaczyk, S., Santosh, K. C. & Verikas, A. (2018). Agreeing to disagree: active learning with noisy labels without crowdsourcing. International Journal of Machine Learning and Cybernetics, 9(8), 1307-1319
Open this publication in new window or tab >>Agreeing to disagree: active learning with noisy labels without crowdsourcing
2018 (English)In: International Journal of Machine Learning and Cybernetics, ISSN 1868-8071, E-ISSN 1868-808X, Vol. 9, no 8, p. 1307-1319Article in journal (Refereed) Published
Abstract [en]

We propose a new active learning method for classification, which handles label noise without relying on multiple oracles (i.e., crowdsourcing). We propose a strategy that selects (for labeling) instances with a high influence on the learned model. An instance x is said to have a high influence on the model h, if training h on x (with label y = h(x)) would result in a model that greatly disagrees with h on labeling other instances. Then, we propose another strategy that selects (for labeling) instances that are highly influenced by changes in the learned model. An instance x is said to be highly influenced, if training h with a set of instances would result in a committee of models that agree on a common label for x but disagree with h(x). We compare the two strategies and we show, on different publicly available datasets, that selecting instances according to the first strategy while eliminating noisy labels according to the second strategy, greatly improves the accuracy compared to several benchmarking methods, even when a significant amount of instances are mislabeled. © Springer-Verlag Berlin Heidelberg 2017

Place, publisher, year, edition, pages
Heidelberg: Springer, 2018
Keywords
Active learning, Classification, Label noise, Mislabeling, Interactive learning, Machine learning, Data mining
National Category
Signal Processing Computer Systems Computer Sciences
Identifiers
urn:nbn:se:hh:diva-33365 (URN)10.1007/s13042-017-0645-0 (DOI)000438855100006 ()2-s2.0-85050140726 (Scopus ID)
Available from: 2017-02-27 Created: 2017-02-27 Last updated: 2020-02-03Bibliographically approved
Bouguelia, M.-R., Nowaczyk, S. & Payberah, A. H. (2018). An adaptive algorithm for anomaly and novelty detection in evolving data streams. Data mining and knowledge discovery, 32(6), 1597-1633
Open this publication in new window or tab >>An adaptive algorithm for anomaly and novelty detection in evolving data streams
2018 (English)In: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 32, no 6, p. 1597-1633Article in journal (Refereed) Published
Abstract [en]

In the era of big data, considerable research focus is being put on designing efficient algorithms capable of learning and extracting high-level knowledge from ubiquitous data streams in an online fashion. While, most existing algorithms assume that data samples are drawn from a stationary distribution, several complex environments deal with data streams that are subject to change over time. Taking this aspect into consideration is an important step towards building truly aware and intelligent systems. In this paper, we propose GNG-A, an adaptive method for incremental unsupervised learning from evolving data streams experiencing various types of change. The proposed method maintains a continuously updated network (graph) of neurons by extending the Growing Neural Gas algorithm with three complementary mechanisms, allowing it to closely track both gradual and sudden changes in the data distribution. First, an adaptation mechanism handles local changes where the distribution is only non-stationary in some regions of the feature space. Second, an adaptive forgetting mechanism identifies and removes neurons that become irrelevant due to the evolving nature of the stream. Finally, a probabilistic evolution mechanism creates new neurons when there is a need to represent data in new regions of the feature space. The proposed method is demonstrated for anomaly and novelty detection in non-stationary environments. Results show that the method handles different data distributions and efficiently reacts to various types of change. © 2018 The Author(s)

Place, publisher, year, edition, pages
New York: Springer, 2018
Keywords
Data stream, Growing neural gas, Change detection, Non-stationary environments, Anomaly and novelty detection
National Category
Signal Processing
Identifiers
urn:nbn:se:hh:diva-36752 (URN)10.1007/s10618-018-0571-0 (DOI)000444383000003 ()2-s2.0-85046792304 (Scopus ID)
Projects
BIDAF
Available from: 2018-05-13 Created: 2018-05-13 Last updated: 2020-02-03Bibliographically approved
Bouguelia, M.-R., Karlsson, A., Pashami, S., Nowaczyk, S. & Holst, A. (2018). Mode tracking using multiple data streams. Information Fusion, 43, 33-46
Open this publication in new window or tab >>Mode tracking using multiple data streams
Show others...
2018 (English)In: Information Fusion, ISSN 1566-2535, E-ISSN 1872-6305, Vol. 43, p. 33-46Article in journal (Refereed) Published
Abstract [en]

Most existing work in information fusion focuses on combining information with well-defined meaning towards a concrete, pre-specified goal. In contradistinction, we instead aim for autonomous discovery of high-level knowledge from ubiquitous data streams. This paper introduces a method for recognition and tracking of hidden conceptual modes, which are essential to fully understand the operation of complex environments. We consider a scenario of analyzing usage of a fleet of city buses, where the objective is to automatically discover and track modes such as highway route, heavy traffic, or aggressive driver, based on available on-board signals. The method we propose is based on aggregating the data over time, since the high-level modes are only apparent in the longer perspective. We search through different features and subsets of the data, and identify those that lead to good clusterings, interpreting those clusters as initial, rough models of the prospective modes. We utilize Bayesian tracking in order to continuously improve the parameters of those models, based on the new data, while at the same time following how the modes evolve over time. Experiments with artificial data of varying degrees of complexity, as well as on real-world datasets, prove the effectiveness of the proposed method in accurately discovering the modes and in identifying which one best explains the current observations from multiple data streams. © 2017 Elsevier B.V. All rights reserved.

Place, publisher, year, edition, pages
Amsterdam: Elsevier, 2018
Keywords
Mode tracking, Clustering, Data streams, Time series, Knowledge discovery
National Category
Computer Sciences
Identifiers
urn:nbn:se:hh:diva-35729 (URN)10.1016/j.inffus.2017.11.011 (DOI)000430032000004 ()2-s2.0-85037072003 (Scopus ID)
Projects
BIDAF
Available from: 2017-12-01 Created: 2017-12-01 Last updated: 2020-02-03Bibliographically approved
Farouq, S., Byttner, S. & Bouguelia, M.-R. (2018). On monitoring heat-pumps with a group-based conformal anomaly detection approach. In: Robert Stahlbock, Gary M. Weiss, Mahmoud Abou-Nasr (Ed.), ICDATA' 18: Proceedings of the 2018 International Conference on Data Science. Paper presented at 2018 Internal Conference on Data Science (ICDATA’18), Las Vegas, NV, USA (pp. 63-69). CSREA Press
Open this publication in new window or tab >>On monitoring heat-pumps with a group-based conformal anomaly detection approach
2018 (English)In: ICDATA' 18: Proceedings of the 2018 International Conference on Data Science / [ed] Robert Stahlbock, Gary M. Weiss, Mahmoud Abou-Nasr, CSREA Press, 2018, p. 63-69Conference paper, Published paper (Refereed)
Abstract [en]

The ever increasing complexity of modern systems and equipment make the task of monitoring their health quite challenging. Traditional methods such as expert defined thresholds, physics based models and process history based techniques have certain drawbacks. Thresholds defined by experts require deep knowledge about the system and are often too conservative. Physics driven approaches are costly to develop and maintain. Finally, process history based models require large amount of data that may not be available at design time of a system. Moreover, the focus of these traditional approaches has been system specific. Hence, when industrial systems are deployed on a large scale, their monitoring becomes a new challenge. Under these conditions, this paper demonstrates the use of a group-based selfmonitoring approach that learns over time from similar systems subject to similar conditions. The approach is based on conformal anomaly detection coupled with an exchangeability test that uses martingales. This allows setting a threshold value based on sound theoretical justification. A hypothesis test based on this threshold is used to decide on if a system has deviated from its group. We demonstrate the feasibility of this approach through a real case study of monitoring a group of heat-pumps where it can detect a faulty hot-water switch-valve and a broken outdoor temperature sensor without previously observing these faults.

Place, publisher, year, edition, pages
CSREA Press, 2018
Keywords
group-based monitoring, nonconformity measure (NCM), martingale test
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:hh:diva-40961 (URN)1-60132-481-2 (ISBN)9781601324818 (ISBN)
Conference
2018 Internal Conference on Data Science (ICDATA’18), Las Vegas, NV, USA
Available from: 2019-11-16 Created: 2019-11-16 Last updated: 2019-11-18Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-2859-6155

Search in DiVA

Show all publications