hh.sePublications
Change search
Link to record
Permanent link

Direct link
BETA
Alternative names
Publications (10 of 61) Show all publications
Bae, J., Helldin, T., Riveiro, M., Nowaczyk, S., Bouguelia, M.-R. & Falkman, G. (2020). Interactive Clustering: A Comprehensive Review. ACM Computing Surveys, 53(1), Article ID 1.
Open this publication in new window or tab >>Interactive Clustering: A Comprehensive Review
Show others...
2020 (English)In: ACM Computing Surveys, ISSN 0360-0300, E-ISSN 1557-7341, Vol. 53, no 1, article id 1Article in journal (Refereed) Published
Abstract [en]

In this survey, 105 papers related to interactive clustering were reviewed according to seven perspectives: (1) on what level is the interaction happening, (2) which interactive operations are involved, (3) how user feedback is incorporated, (4) how interactive clustering is evaluated, (5) which data and (6) which clustering methods have been used, and (7) what outlined challenges there are. This article serves as a comprehensive overview of the field and outlines the state of the art within the area as well as identifies challenges and future research needs. © 2020 Copyright held by the owner/author(s).

Place, publisher, year, edition, pages
New York, NY: ACM Digital Library, 2020
Keywords
Clustering, Interactive, Interaction, User, Evaluation, Feedback, Survey, Machine Learning, Data Mining
National Category
Computer Systems
Identifiers
urn:nbn:se:hh:diva-41634 (URN)10.1145/3340960 (DOI)
Funder
Knowledge Foundation, BIDAF 20140221Swedish Research Council, EXPLAIN VR 2018-03622
Available from: 2020-02-10 Created: 2020-02-10 Last updated: 2020-02-18Bibliographically approved
Calikus, E., Nowaczyk, S., Pinheiro Sant'Anna, A. & Dikmen, O. (2020). No Free Lunch But A Cheaper Supper: A General Framework for Streaming Anomaly Detection. Expert systems with applications
Open this publication in new window or tab >>No Free Lunch But A Cheaper Supper: A General Framework for Streaming Anomaly Detection
2020 (English)In: Expert systems with applications, ISSN 0957-4174, E-ISSN 1873-6793Article in journal (Refereed) Submitted
Abstract [en]

In recent years, there has been increased research interest in detecting anomalies in temporal streaming data. A variety of algorithms have been developed in the data mining community, which can be divided into two categories (i.e., general and ad hoc). In most cases, general approaches assume the one-size-fits-all solution model where a single anomaly detector can detect all anomalies in any domain.  To date, there exists no single general method that has been shown to outperform the others across different anomaly types, use cases and datasets. On the other hand, ad hoc approaches that are designed for a specific application lack flexibility. Adapting an existing algorithm is not straightforward if the specific constraints or requirements for the existing task change. In this paper, we propose SAFARI, a general framework formulated by abstracting and unifying the fundamental tasks in streaming anomaly detection, which provides a flexible and extensible anomaly detection procedure. SAFARI helps to facilitate more elaborate algorithm comparisons by allowing us to isolate the effects of shared and unique characteristics of different algorithms on detection performance. Using SAFARI, we have implemented various anomaly detectors and identified a research gap that motivates us to propose a novel learning strategy in this work. We conducted an extensive evaluation study of 20 detectors that are composed using SAFARI and compared their performances using real-world benchmark datasets with different properties. The results indicate that there is no single superior detector that works well for every case, proving our hypothesis that "there is no free lunch" in the streaming anomaly detection world. Finally, we discuss the benefits and drawbacks of each method in-depth and draw a set of conclusions to guide future users of SAFARI.

Place, publisher, year, edition, pages
Oxford: Elsevier, 2020
Keywords
anomaly detection
National Category
Computer Sciences
Identifiers
urn:nbn:se:hh:diva-41420 (URN)
Funder
Knowledge Foundation, 20160103
Available from: 2020-01-29 Created: 2020-01-29 Last updated: 2020-02-18
Galozy, A., Nowaczyk, S., Pinheiro Sant'Anna, A., Ohlsson, M. & Lingman, M. (2020). Pitfalls of medication adherence approximation through EHR and pharmacy records: Definitions, data and computation. International Journal of Medical Informatics, 136, Article ID 104092.
Open this publication in new window or tab >>Pitfalls of medication adherence approximation through EHR and pharmacy records: Definitions, data and computation
Show others...
2020 (English)In: International Journal of Medical Informatics, ISSN 1386-5056, E-ISSN 1872-8243, Vol. 136, article id 104092Article in journal (Refereed) Published
Abstract [en]

Background and purpose: Patients’ adherence to medication is a complex, multidimensional phenomenon. Dispensation data and electronic health records are used to approximate medication-taking through refill adherence. In-depth discussions on the adverse effects of data quality and computational differences are rare. The purpose of this article is to evaluate the impact of common pitfalls when computing medication adherence using electronic health records.

Procedures: We point out common pitfalls associated with the data and operationalization of adherence measures. We provide operational definitions of refill adherence and conduct experiments to determine the effect of the pitfalls on adherence estimations. We performed statistical significance testing on the impact of common pitfalls using a baseline scenario as reference.

Findings: Slight changes in definition can significantly skew refill adherence estimates. Pickup patterns cause significant disagreement between measures and the commonly used proportion of days covered. Common data related issues had a small but statistically significant (p < 0.05) impact on population-level and significant effect on individual cases.

Conclusion: Data-related issues encountered in real-world administrative databases, which affect various operational definitions of refill adherence differently, can significantly skew refill adherence values, leading to false conclusions about adherence, particularly when estimating adherence for individuals. © 2020 The Authors. Published by Elsevier B.V. 

Place, publisher, year, edition, pages
Shannon: Elsevier, 2020
Keywords
Medication refill adherence, Electronic health records, Data quality, Pitfalls
National Category
Other Medical Engineering
Identifiers
urn:nbn:se:hh:diva-41712 (URN)10.1016/j.ijmedinf.2020.104092 (DOI)32062562 (PubMedID)2-s2.0-85079281579 (Scopus ID)
Funder
Vinnova, 2017-04617
Note

Other funding: Health Technology Center and CAISR at Halmstad University and Halland's Hospital

Available from: 2020-02-25 Created: 2020-02-25 Last updated: 2020-03-10Bibliographically approved
Sheikholharam Mashhadi, P., Nowaczyk, S. & Pashami, S. (2020). Stacked Ensemble of Recurrent Neural Networks for Predicting Turbocharger Remaining Useful Life. Applied Sciences, 10(1), Article ID 69.
Open this publication in new window or tab >>Stacked Ensemble of Recurrent Neural Networks for Predicting Turbocharger Remaining Useful Life
2020 (English)In: Applied Sciences, E-ISSN 2076-3417, Vol. 10, no 1, article id 69Article in journal (Refereed) Published
Abstract [en]

Predictive Maintenance (PM) is a proactive maintenance strategy that tries to minimize a system’s downtime by predicting failures before they happen. It uses data from sensors to measure the component’s state of health and make forecasts about its future degradation. However, existing PM methods typically focus on individual measurements. While it is natural to assume that a history of measurements carries more information than a single one. This paper aims at incorporating such information into PM models. In practice, especially in the automotive domain, diagnostic models have low performance, due to a large amount of noise in the data and limited sensing capability. To address this issue, this paper proposes to use a specific type of ensemble learning known as Stacked Ensemble. The idea is to aggregate predictions of multiple models—consisting of Long Short-Term Memory (LSTM) and Convolutional-LSTM—via a meta model, in order to boost performance. Stacked Ensemble model performs well when its base models are as diverse as possible. To this end, each such model is trained using a specific combination of the following three aspects: feature subsets, past dependency horizon, and model architectures. Experimental results demonstrate benefits of the proposed approach on a case study of heavy-duty truck turbochargers. © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). 

Place, publisher, year, edition, pages
Basel: MDPI, 2020
Keywords
predictive maintenance, remaining useful life, recurrent neural networks, LSTM, Stacked Ensemble
National Category
Other Computer and Information Science
Identifiers
urn:nbn:se:hh:diva-41314 (URN)10.3390/app10010069 (DOI)
Projects
HEALTH-VINNOVA
Funder
Vinnova
Available from: 2019-12-29 Created: 2019-12-29 Last updated: 2020-01-21Bibliographically approved
Dahl, O., Johansson, F., Khoshkangini, R., Pashami, S., Nowaczyk, S. & Pihl, C. (2020). Understanding Association Between Logged Vehicle Data and Vehicle Marketing Parameters - Using Clustering and Rule-Based Machine Learning. In: : . Paper presented at The 3rd International Conference on Information Management and Processing (ICIMP 2020), Portsmouth, United Kingdom, June 11-13, 2020.
Open this publication in new window or tab >>Understanding Association Between Logged Vehicle Data and Vehicle Marketing Parameters - Using Clustering and Rule-Based Machine Learning
Show others...
2020 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Trucks are designed, configured and marketed for various working environments. There lies a concern whether trucks are used as intended by the manufacturer, as usage may impact the longevity, efficiency and productivity of the trucks.

In this paper we propose a framework that aims to extract costumers' vehicle behaviours from LVD in order to evaluate whether they align with vehicle configurations, so-called GTA parameters. GMMs are employed to cluster and classify various vehicle behaviors from the LVD. RBML was applied on the clusters to examine whether vehicle behaviors follow the GTA configuration. Particularly, we propose an approach based on studying associations that is able to extract insights on whether the trucks are used as intended. Experimental results shown that while for the vast majority of the trucks' behaviors seemingly follows their GTA configuration, there are also interesting outliers that warrant further analysis.

Keywords
Machine Learning, Clustering, Usage Behaviors, Association Rule Mining, Gaussian Mixture Models.
National Category
Computer Sciences
Identifiers
urn:nbn:se:hh:diva-41214 (URN)
Conference
The 3rd International Conference on Information Management and Processing (ICIMP 2020), Portsmouth, United Kingdom, June 11-13, 2020
Available from: 2019-12-10 Created: 2019-12-10 Last updated: 2020-02-17
Calikus, E., Nowaczyk, S., Pinheiro Sant'Anna, A., Gadd, H. & Werner, S. (2019). A data-driven approach for discovering heat load patterns in district heating. Applied Energy, 252, Article ID 113409.
Open this publication in new window or tab >>A data-driven approach for discovering heat load patterns in district heating
Show others...
2019 (English)In: Applied Energy, ISSN 0306-2619, E-ISSN 1872-9118, Vol. 252, article id 113409Article in journal (Refereed) Published
Abstract [en]

Understanding the heat usage of customers is crucial for effective district heating operations and management. Unfortunately, existing knowledge about customers and their heat load behaviors is quite scarce. Most previous studies are limited to small-scale analyses that are not representative enough to understand the behavior of the overall network. In this work, we propose a data-driven approach that enables large-scale automatic analysis of heat load patterns in district heating networks without requiring prior knowledge. Our method clusters the customer profiles into different groups, extracts their representative patterns, and detects unusual customers whose profiles deviate significantly from the rest of their group. Using our approach, we present the first large-scale, comprehensive analysis of the heat load patterns by conducting a case study on many buildings in six different customer categories connected to two district heating networks in the south of Sweden. The 1222 buildings had a total floor space of 3.4 million square meters and used 1540 TJ heat during 2016. The results show that the proposed method has a high potential to be deployed and used in practice to analyze and understand customers’ heat-use habits. © 2019 Calikus et al. Published by Elsevier Ltd.

Place, publisher, year, edition, pages
Oxford: Elsevier, 2019
Keywords
District heating, Energy efficiency, Heat load patterns, Clustering, Abnormal heat use
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:hh:diva-40907 (URN)10.1016/j.apenergy.2019.113409 (DOI)000497968000013 ()2-s2.0-85066961984 (Scopus ID)
Funder
Knowledge Foundation, 20160103
Available from: 2019-11-12 Created: 2019-11-12 Last updated: 2020-01-30Bibliographically approved
Marcos, M., Juarez, J. M., Lenz, R., Nalepa, G. J., Nowaczyk, S., Peleg, M., . . . Stiglic, G. (Eds.). (2019). Artificial Intelligence in Medicine: Knowledge Representation and Transparent and Explainable Systems. Paper presented at 7th Conference on Artificial Intelligence in Medicine International Workshops (AIME 2019), KR4HC/ProHealth and TEAAM, Poznan, Poland, June 26–29, 2019. Heidelberg: Springer
Open this publication in new window or tab >>Artificial Intelligence in Medicine: Knowledge Representation and Transparent and Explainable Systems
Show others...
2019 (English)Conference proceedings (editor) (Refereed)
Abstract [en]

This book constitutes revised selected papers from the AIME 2019 workshops KR4HC/ProHealth 2019, the Workshop on Knowledge Representation for Health Care and Process-Oriented Information Systems in Health Care, and TEAAM 2019, the Workshop on Transparent, Explainable and Affective AI in Medical Systems.

The volume contains 5 full papers from KR4HC/ProHealth, which were selected out of 13 submissions. For TEAAM 8 papers out of 10 submissions were accepted for publication. © 2019 Springer Nature Switzerland AG. Part of Springer Nature.

Place, publisher, year, edition, pages
Heidelberg: Springer, 2019. p. 174
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 11979
Keywords
artificial intelligence, classification, computer hardware, computer systems, computer vision, data mining, databases, education, engineering, image analysis, image processing, internet, learning, linguistics, machine learning, mathematics, pattern recognition, semantics, signal processing
National Category
Signal Processing
Identifiers
urn:nbn:se:hh:diva-41481 (URN)10.1007/978-3-030-37446-4 (DOI)978-3-030-37446-4 (ISBN)978-3-030-37446-4 (ISBN)
Conference
7th Conference on Artificial Intelligence in Medicine International Workshops (AIME 2019), KR4HC/ProHealth and TEAAM, Poznan, Poland, June 26–29, 2019
Note

Also part of the Lecture Notes in Artificial Intelligence book sub series (LNAI, volume 11979).

Available from: 2020-02-01 Created: 2020-02-01 Last updated: 2020-03-10Bibliographically approved
Khan, T., Lundgren, L., Anderson, D. G., Novak, I., Dougherty, M., Verikas, A., . . . Aharonson, V. (2019). Assessing Parkinson's disease severity using speech analysis in non-native speakers. Computer speech & language (Print), 61, Article ID 101047.
Open this publication in new window or tab >>Assessing Parkinson's disease severity using speech analysis in non-native speakers
Show others...
2019 (English)In: Computer speech & language (Print), ISSN 0885-2308, E-ISSN 1095-8363, Vol. 61, article id 101047Article in journal (Refereed) Published
Abstract [en]

Background: Speech disorder is a common manifestation of Parkinson's disease with two main symptoms, dysprosody and dysphonia. Previous research studying objective measures of speech symptoms involved patients and examiners who were native language speakers. Measures such as cepstral separation difference (CSD) features to quantify dysphonia and dysprosody accurately distinguish the severity of speech impairment. Importantly CSD, together with other speech features, including Mel-frequency coefficients, fundamental-frequency variation, and spectral dynamics, characterize speech intelligibility in PD. However, non-native language speakers transfer phonological rules of their mother language that tamper speech assessment.

Objectives: This paper explores CSD's capability: first, to quantify dysprosody and dysphonia of non-native language speakers, Parkinson patients and controls, and secondly, to characterize the severity of speech impairment when Parkinson's dysprosody accompanies non-native linguistic dysprosody.

Methods: CSD features were extracted from 168 speech samples recorded from 19 healthy controls, 15 rehabilitated and 23 not-rehabilitated Parkinson patients in three different clinical speech tests based on Unified Parkinson's disease rating scale motor-speech examination. Statistical analyses were performed to compare groups using analysis of variance, intraclass correlation, and Guttman correlation coefficient µ2. Random forests were trained to classify the severity of speech impairment using CSD and the other speech features. Feature importance in classification was determined using permutation importance score.

Results: Results showed that the CSD feature describing dysphonia was uninfluenced by non-native accents, strongly correlated with the clinical examination (µ2>0.5), and significantly discriminated between the healthy, rehabilitated, and not-rehabilitated patient groups based on the severity of speech symptoms. However, the feature describing dysprosody did not correlate with the clinical examination but significantly distinguished the groups. The classification model based on random forests and selected features characterized the severity of speech impairment of non-native language speakers with high accuracy. Importantly, the permutation importance score of the CSD feature representing dysphonia was the highest compared to other features. Results showed a strong negative correlation (µ2<-0.5) between L-dopa administration and the CSD features.

Conclusions: Although non-native accents reduce speech intelligibility, the CSD features can accurately characterize speech impairment, which is not always possible in the clinical examination. Findings support using CSD for monitoring Parkinson's disease.

© 2019 Elsevier Ltd. All rights reserved.

Place, publisher, year, edition, pages
London, UK: Academic Press, 2019
Keywords
Dysphonia, Dysprosody, Parkinson's disease, Speech processing, Tele-monitoring
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:hh:diva-41003 (URN)10.1016/j.csl.2019.101047 (DOI)2-s2.0-85075748795 (Scopus ID)
Note

Funding: Promobilia Foundation, Sweden

Available from: 2019-11-21 Created: 2019-11-21 Last updated: 2020-02-17Bibliographically approved
Pirasteh, P., Nowaczyk, S., Pashami, S., Löwenadler, M., Thunberg, K., Ydreskog, H. & Berck, P. (2019). Interactive feature extraction for diagnostic trouble codes in predictive maintenance: A case study from automotive domain. In: Proceedings of the Workshop on Interactive Data Mining: . Paper presented at WSDM 2019: The 12th ACM International Conference on Web Search and Data Mining, Melbourne, VIC, Australia, 11-15 February, 2019. New York, NY: Association for Computing Machinery (ACM), Article ID 4.
Open this publication in new window or tab >>Interactive feature extraction for diagnostic trouble codes in predictive maintenance: A case study from automotive domain
Show others...
2019 (English)In: Proceedings of the Workshop on Interactive Data Mining, New York, NY: Association for Computing Machinery (ACM), 2019, article id 4Conference paper, Published paper (Refereed)
Abstract [en]

Predicting future maintenance needs of equipment can be addressed in a variety of ways. Methods based on machine learning approaches provide an interesting platform for mining large data sets to find patterns that might correlate with a given fault. In this paper, we approach predictive maintenance as a classification problem and use Random Forest to separate data readouts within a particular time window into those corresponding to faulty and non-faulty component categories. We utilize diagnostic trouble codes (DTCs) as an example of event-based data, and propose four categories of features that can be derived from DTCs as a predictive maintenance framework. We test the approach using large-scale data from a fleet of heavy duty trucks, and show that DTCs can be used within our framework as indicators of imminent failures in different components.

Place, publisher, year, edition, pages
New York, NY: Association for Computing Machinery (ACM), 2019
Keywords
Predictive maintenance, failure detection, diagnostic trouble codes, feature extraction
National Category
Signal Processing
Identifiers
urn:nbn:se:hh:diva-40184 (URN)10.1145/3304079.3310288 (DOI)2-s2.0-85069771384 (Scopus ID)978-1-4503-6296-2 (ISBN)
Conference
WSDM 2019: The 12th ACM International Conference on Web Search and Data Mining, Melbourne, VIC, Australia, 11-15 February, 2019
Available from: 2019-07-07 Created: 2019-07-07 Last updated: 2020-02-03Bibliographically approved
Calikus, E., Fan, Y., Nowaczyk, S. & Pinheiro Sant'Anna, A. (2019). Interactive-cosmo: Consensus self-organized models for fault detection with expert feedback. In: Proceedings of the Workshop on Interactive Data Mining, WIDM 2019: . Paper presented at 1st Workshop on Interactive Data Mining, WIDM 2019, co-located with 12th ACM International Conference on Web Search and Data Mining, WSDM 2019, Melbourne, Australia; 15 February, 2019 (pp. 1-9). New York: Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Interactive-cosmo: Consensus self-organized models for fault detection with expert feedback
2019 (English)In: Proceedings of the Workshop on Interactive Data Mining, WIDM 2019, New York: Association for Computing Machinery (ACM), 2019, p. 1-9Conference paper, Published paper (Refereed)
Abstract [en]

Diagnosing deviations and predicting faults is an important task, especially given recent advances related to Internet of Things. However, the majority of the efforts for diagnostics are still carried out by human experts in a time-consuming and expensive manner. One promising approach towards self-monitoring systems is based on the "wisdom of the crowd" idea, where malfunctioning equipments are detected by understanding the similarities and differences in the operation of several alike systems.

A fully autonomous fault detection, however, is not possible, since not all deviations or anomalies correspond to faulty behaviors; many can be explained by atypical usage or varying external conditions. In this work, we propose a method which gradually incorporates expert-provided feedback for more accurate self-monitoring. Our idea is to support model adaptation while allowing human feedback to persist over changes in data distribution, such as concept drift. © 2019 Association for Computing Machinery.

Place, publisher, year, edition, pages
New York: Association for Computing Machinery (ACM), 2019
Keywords
Anomaly Detection, Self-Monitoring, Active Learning, Human-in- the-loop
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:hh:diva-41365 (URN)10.1145/3304079.3310289 (DOI)2-s2.0-85069779014 (Scopus ID)978-1-4503-6296-2 (ISBN)
Conference
1st Workshop on Interactive Data Mining, WIDM 2019, co-located with 12th ACM International Conference on Web Search and Data Mining, WSDM 2019, Melbourne, Australia; 15 February, 2019
Available from: 2020-01-10 Created: 2020-01-10 Last updated: 2020-01-30Bibliographically approved
Projects
iMedA: Improving MEDication Adherence through Person Centered Care and Adaptive Interventions [2017-04617_Vinnova]; Halmstad University; Publications
Galozy, A., Nowaczyk, S., Pinheiro Sant'Anna, A., Ohlsson, M. & Lingman, M. (2020). Pitfalls of medication adherence approximation through EHR and pharmacy records: Definitions, data and computation. International Journal of Medical Informatics, 136, Article ID 104092.
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-7796-5201

Search in DiVA

Show all publications