Human activity recognition has become an active research field over the past few years due to its wide application in various fields such as health-care, smart home monitoring, and surveillance. Existing approaches for activity recognition in smart homes have achieved promising results. Most of these approaches evaluate real-time recognition of activities using only sensor activations that precede the evaluation time (where the decision is made). However, in several critical situations, such as diagnosing people with dementia, “preceding sensor activations” are not always sufficient to accurately recognize the inhabitant's daily activities in each evaluated time. To improve performance, we propose a method that delays the recognition process in order to include some sensor activations that occur after the point in time where the decision needs to be made. For this, the proposed method uses multiple incremental fuzzy temporal windows to extract features from both preceding and some oncoming sensor activations. The proposed method is evaluated with two temporal deep learning models (convolutional neural network and long short-term memory), on a binary sensor dataset of real daily living activities. The experimental evaluation shows that the proposed method achieves significantly better results than the real-time approach, and that the representation with fuzzy temporal windows enhances performance within deep learning models. © Copyright 2020 IEEE
In this survey, 105 papers related to interactive clustering were reviewed according to seven perspectives: (1) on what level is the interaction happening, (2) which interactive operations are involved, (3) how user feedback is incorporated, (4) how interactive clustering is evaluated, (5) which data and (6) which clustering methods have been used, and (7) what outlined challenges there are. This article serves as a comprehensive overview of the field and outlines the state of the art within the area as well as identifies challenges and future research needs. © 2020 Copyright held by the owner/author(s).
This paper introduces an unsupervised method for the classification of discrete rovers' slip events based on proprioceptive signals. In particular, the method is able to automatically discover and track various degrees of slip (i.e. low slip, moderate slip, high slip). The proposed method is based on aggregating the data over time, since high level concepts, such as high and low slip, are concepts that are dependent on longer time perspectives. Different features and subsets of the data have been identified leading to a proper clustering, interpreting those clusters as initial models of the prospective concepts. Bayesian tracking has been used in order to continuously improve the parameters of these models, based on the new data. Two real datasets are used to validate the proposed approach in comparison to other known unsupervised and supervised machine learning methods. The first dataset is collected by a single-wheel testbed available at MIT. The second dataset was collected by means of a planetary exploration rover in real off-road conditions. Experiments prove that the proposed method is more accurate (up to 86% of accuracy vs. 80% for K-means) in discovering various levels of slip while being fully unsupervised (no need for hand-labeled data for training). © 2017 ISTVS
Most existing work in information fusion focuses on combining information with well-defined meaning towards a concrete, pre-specified goal. In contradistinction, we instead aim for autonomous discovery of high-level knowledge from ubiquitous data streams. This paper introduces a method for recognition and tracking of hidden conceptual modes, which are essential to fully understand the operation of complex environments. We consider a scenario of analyzing usage of a fleet of city buses, where the objective is to automatically discover and track modes such as highway route, heavy traffic, or aggressive driver, based on available on-board signals. The method we propose is based on aggregating the data over time, since the high-level modes are only apparent in the longer perspective. We search through different features and subsets of the data, and identify those that lead to good clusterings, interpreting those clusters as initial, rough models of the prospective modes. We utilize Bayesian tracking in order to continuously improve the parameters of those models, based on the new data, while at the same time following how the modes evolve over time. Experiments with artificial data of varying degrees of complexity, as well as on real-world datasets, prove the effectiveness of the proposed method in accurately discovering the modes and in identifying which one best explains the current observations from multiple data streams. © 2017 Elsevier B.V. All rights reserved.
In the era of big data, considerable research focus is being put on designing efficient algorithms capable of learning and extracting high-level knowledge from ubiquitous data streams in an online fashion. While, most existing algorithms assume that data samples are drawn from a stationary distribution, several complex environments deal with data streams that are subject to change over time. Taking this aspect into consideration is an important step towards building truly aware and intelligent systems. In this paper, we propose GNG-A, an adaptive method for incremental unsupervised learning from evolving data streams experiencing various types of change. The proposed method maintains a continuously updated network (graph) of neurons by extending the Growing Neural Gas algorithm with three complementary mechanisms, allowing it to closely track both gradual and sudden changes in the data distribution. First, an adaptation mechanism handles local changes where the distribution is only non-stationary in some regions of the feature space. Second, an adaptive forgetting mechanism identifies and removes neurons that become irrelevant due to the evolving nature of the stream. Finally, a probabilistic evolution mechanism creates new neurons when there is a need to represent data in new regions of the feature space. The proposed method is demonstrated for anomaly and novelty detection in non-stationary environments. Results show that the method handles different data distributions and efficiently reacts to various types of change. © 2018 The Author(s)
We propose a new active learning method for classification, which handles label noise without relying on multiple oracles (i.e., crowdsourcing). We propose a strategy that selects (for labeling) instances with a high influence on the learned model. An instance x is said to have a high influence on the model h, if training h on x (with label y = h(x)) would result in a model that greatly disagrees with h on labeling other instances. Then, we propose another strategy that selects (for labeling) instances that are highly influenced by changes in the learned model. An instance x is said to be highly influenced, if training h with a set of instances would result in a committee of models that agree on a common label for x but disagree with h(x). We compare the two strategies and we show, on different publicly available datasets, that selecting instances according to the first strategy while eliminating noisy labels according to the second strategy, greatly improves the accuracy compared to several benchmarking methods, even when a significant amount of instances are mislabeled. © Springer-Verlag Berlin Heidelberg 2017
The majority of existing machine learning algorithms assume that training examples are already represented with sufficiently good features, in practice ones that are designed manually. This traditional way of preprocessing the data is not only tedious and time consuming, but also not sufficient to capture all the different aspects of the available information. With big data phenomenon, this issue is only going to grow, as the data is rarely collected and analyzed with a specific purpose in mind, and more often re-used for solving different problems. Moreover, the expert knowledge about the problem which allows them to come up with good representations does not necessarily generalize to other tasks. Therefore, much focus has been put on designing methods that can automatically learn features or representations of the data instead of learning from handcrafted features. However, a lot of this work used ad hoc methods and the theoretical understanding in this area is lacking.
In contextual anomaly detection, an object is only considered anomalous within a specific context. Most existing methods use a single context based on a set of user-specified contextual features. However, identifying the right context can be very challenging in practice, especially in datasets with a large number of attributes. Furthermore, in real-world systems, there might be multiple anomalies that occur in different contexts and, therefore, require a combination of several "useful" contexts to unveil them. In this work, we propose a novel approach, called WisCon (Wisdom of the Contexts), to effectively detect complex contextual anomalies in situations where the true contextual and behavioral attributes are unknown. Our method constructs an ensemble of multiple contexts, with varying importance scores, based on the assumption that not all useful contexts are equally so. We estimate the importance of each context using an active learning approach with a novel query strategy. Experiments show that WisCon significantly outperforms existing baselines in different categories (i.e., active classifiers, unsupervised contextual, and non-contextual anomaly detectors) on 18 datasets. Furthermore, the results support our initial hypothesis that there is no single perfect context that successfully uncovers all kinds of contextual anomalies, and leveraging the "wisdom" of multiple contexts is necessary. © 2022, The Author(s).
In several applications, when anomalies are detected, human experts have to investigate or verify them one by one. As they investigate, they unwittingly produce a label - true positive (TP) or false positive (FP). In this paper, we propose a method (called OMD-Clustering) that exploits this label feedback to minimize the FP rate and detect more relevant anomalies, while minimizing the expert effort required to inves- tigate them. The OMD-Clustering method iteratively suggests the top-1 anomalous instance to a human expert and receives feedback. Before suggesting the next anomaly, the method re-ranks instances so that the top anomalous instances are similar to the TP instances and dissimi- lar to the FP instances. This is achieved by learning to score anomalies differently in various regions of the feature space. An experimental eval- uation on several real-world datasets is conducted. The results show that OMD-Clustering achieves significant improvement in both detection pre- cision and expert effort compared to state-of-the-art interactive anomaly detection methods.
This work proposes a new exchangeability test for a random sequence through a martingale-based approach. Its main contributions include 1) an additive martingale which is more amenable for designing exchangeability tests by exploiting the Hoeffding-Azuma lemma and 2) different betting functions for constructing the additive martingale. By choosing the underlying probability density function of p-values as a betting function, it can be shown that, when a change-point appears, a satisfying trade-off between the smoothness and expected one-step increment of the martingale sequence can be obtained. An online algorithm based on beta distribution parametrization for constructing this betting function is discussed in detail as well. © 2021, IGI Global.
Extracting operation cycles from the historical reading of sensors is an essential step in IoT data analytics. For instance, we can exploit the obtained cycles for learning the normal states to feed into semi-supervised models or dictionaries for efficient real-time anomaly detection on the sensors. However, this is a difficult problem due to this fact that we may have different types of cycles, each of which with varying lengths. Current approaches are highly dependent on manual efforts by the aid of visualization and knowledge of domain experts, which is not feasible on a large scale. We propose a fully automated method called CycleFootprint that can: 1) identify the most relevant signal that has the most obvious recurring patterns among multiple signals; and 2) automatically find the cycles from the selected signal. The main idea behind CycleFootprint is mining footprints in the cycles. We assume that there should be a unique pattern in each cycle that shows up repeatedly in each cycle. By mining those footprints, we can identify cycles. We evaluate our method with existing labeled ground truth data of a real separator in marine application equipped with multiple health monitoring sensors. 86\% of cycles extracted by our method match fully or with at least 99\% overlap with true cycles, which sounds promising given its unsupervised and fully automated nature. © Springer Nature Switzerland AG 2020
The ever increasing complexity of modern systems and equipment make the task of monitoring their health quite challenging. Traditional methods such as expert defined thresholds, physics based models and process history based techniques have certain drawbacks. Thresholds defined by experts require deep knowledge about the system and are often too conservative. Physics driven approaches are costly to develop and maintain. Finally, process history based models require large amount of data that may not be available at design time of a system. Moreover, the focus of these traditional approaches has been system specific. Hence, when industrial systems are deployed on a large scale, their monitoring becomes a new challenge. Under these conditions, this paper demonstrates the use of a group-based selfmonitoring approach that learns over time from similar systems subject to similar conditions. The approach is based on conformal anomaly detection coupled with an exchangeability test that uses martingales. This allows setting a threshold value based on sound theoretical justification. A hypothesis test based on this threshold is used to decide on if a system has deviated from its group. We demonstrate the feasibility of this approach through a real case study of monitoring a group of heat-pumps where it can detect a faulty hot-water switch-valve and a broken outdoor temperature sensor without previously observing these faults.
The monitoring infrastructure of an industrial fleet can rely on the so-called unit-level and subfleet-level models to observe the behavior of a target unit. However, such infrastructure has to confront several challenges. First, from an anomaly detection perspective of monitoring a target unit, unit-level and subfleet-level models can give different information about the nature of an anomaly, and which approach or level model is appropriate is not always clear. Second, in the absence of well-understood prior models of unit and subfleet behavior, the choice of a base model at their respective levels, especially in an online/streaming setting, may not be clear. Third, managing false alarms is a major problem. To deal with these challenges, we proposed to rely on the conformal anomaly detection framework. In addition, an ensemble approach was deployed to mitigate the knowledge gap in understanding the underlying data-generating process at the unit and subfleet levels. Therefore, to monitor the behavior of a target unit, a unit-level ensemble model (ULEM) and a subfleet-level ensemble model (SLEM) were constructed, where each member of the respective ensemble is based on a conformal anomaly detector (CAD). However, since the information obtained by these two ensemble models through their p-values may not always agree, a combined ensemble model (CEM) was proposed. The results are based on real-world operational data obtained from district heating (DH) substations. Here, it was observed that CEM reduces the overall false alarms compared to ULEM or SLEM, albeit at the cost of some detection delay. The analysis demonstrated the advantages and limitations of ULEM, SLEM, and CEM. Furthermore, discords obtained from the state-of-the-art matrix-profile (MP) method and the combined calibration scores obtained from ULEM and SLEM were compared in an offline setting. Here, it was observed that SLEM achieved a better overall precision and detection delay. Finally, the different components related to ULEM, SLEM, and CEM were put together into what we refer to as TRANTOR: a conformal anomaly detection based industrial fleet monitoring framework. The proposed framework is expected to enable fleet operators in various domains to improve their monitoring infrastructure by efficiently detecting anomalous behavior and controlling false alarms at the target units. © 2022
We considered the case of monitoring a large fleet where heterogeneity in the operational behavior among its constituent units (i.e., systems or machines) is non-negligible, and no labeled data is available. Each unit in the fleet, referred to as a target, is tracked by its sub-fleet. A conformal sub-fleet (CSF) is a set of units that act as a proxy for the normal operational behavior of a target unit by relying on the Mondrian conformal anomaly detection framework. Two approaches, the k-nearest neighbors and conformal clustering, were investigated for constructing such a sub-fleet by formulating a stability criterion. Moreover, it is important to discover the sub-sequence of events that describes an anomalous behavior in a target unit. Hence, we proposed to extract such sub-sequences for further investigation without pre-specifying their length. We refer to it as a conformal anomaly sequence (CAS). Furthermore, different nonconformity measures were evaluated for their efficiency, i.e., their ability to detect anomalous behavior in a target unit, based on the length of the observed CAS and the S-criterion value. The CSF approach was evaluated in the context of monitoring district heating substations. Anomalous behavior sub-sequences were corroborated with the domain expert leading to the conclusion that the proposed approach has the potential to be useful for both diagnostic and knowledge extraction purposes, especially in domains where labeled data is not available or hard to obtain. © 2021
A typical district heating (DH) network consists of hundreds, sometimes thousands, of substations. In the absence of a well-understood prior model or data labels about each substation, the overall monitoring of such large number of substations can be challenging. To overcome the challenge, an approach based on the collective operational monitoring of each substation by a local group (i.e., the reference-group) of other similar substations in the network was formulated. Herein, if a substation of interest (i.e., the target) starts to behave differently in comparison to those in its reference-group, then it was designated as an outlier. The approach was demonstrated on the monitoring of the return temperature variable for atypical and faulty operational behavior in 778 substations associated with multi-dwelling buildings. The choice of an appropriate similarity measure along with its size k were the two important factors that enables a reference-group to effectively detect an outlier target. Thus, different similarity measures and size k for the construction of the reference-groups were investigated, which led to the selection of the Euclidean distance with k = 80. This setup resulted in the detection of 77 target substations that were outliers, i.e., the behavior of their return temperature changed in comparison to the majority of those in their respective reference-groups. Of these, 44 were detected due to the local construction of the reference-groups. In addition, six frequent patterns of deviating behavior in the return temperature of the substations were identified using the reference-group based approach, which were then further corroborated by the feedback from a DH domain expert. © 2020 Elsevier Ltd
This paper demonstrates how to explore and visualize different types of structure in data, including clusters, anomalies, causal relations, and higher order relations. The methods are developed with the goal of being as automatic as possible and applicable to massive, streaming, and distributed data. Finally, a decentralized learning scheme is discussed, enabling finding structure in the data without collecting the data centrally.
We approach the problem of identifying and interpreting clusters over different time scales and granularity in multivariate time series data. We extract statistical features over a sliding window of each time series, and then use a Gaussian mixture model to identify clusters which are then projected back on the data streams. The human analyst can then further analyze this projection and adjust the size of the sliding window and the number of clusters in order to capture the different types of clusters over different time scales. We demonstrate the effectiveness of our approach in two different application scenarios: (1) fleet management and (2) district heating, wherein each scenario, several different types of meaningful clusters can be identified when varying over these dimensions. © 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Technological advancements and widespread adaptation of new technology in industry have made industrial time series data more available than ever before. With this development grows the need for versatile methods for mining industrial time series data. This paper introduces a practical approach for joint human-machine exploration of industrial time series data using the Matrix Profile (MP), and presents some challenges involved. The approach is demonstrated on three real-life industrial data sets to show how it enables the user to quickly extract semantic information, detect cycles, find deviating patterns, and gain a deeper understanding of the time series. A benchmark test is also presented on ECG (electrocardiogram) data, showing that the approach works well in comparison to previously suggested methods for extracting relevant time series motifs. © 2022, The Author(s).
Commonly used similarity-based algorithms in memory-based collaborative filtering may provide unreliable and misleading results. In a cold start situation, users may find the most similar neighbors by relying on an insufficient number of ratings, resulting in low-quality recommendations. Such poor recommendations can also result from similarity metrics as they are incapable of capturing similarities among uncommon items. For example, when identical items between two users are popular items, and both users rated them with high scores, their different preferences toward other items are hidden from similarity metrics. In this paper, we propose a method that estimates the final ratings based on a combination of multiple ratings supplied by various similarity measures. Our experiments show that this combination benefits from the diversity within similarities and offers high-quality personalized suggestions to the target user. © The Author(s), under exclusive licence to Springer Nature Switzerland AG 2021
The data shared over the Internet tends to originate from ubiquitous and autonomous sources such as mobile phones, fitness trackers, and IoT devices. Centralized and federated machine learning solutions represent the predominant way of providing smart services for users. However, moving data to central location for analysis causes not onlymany privacy concerns, but also communication overhead. Therefore, incertain situations machine learning models need to be trained in a collaborative and decentralized manner, similar to the way the data is originally generated without requiring any central authority for data or modelaggregation. This paper presents a decentralized and adaptive k-means algorithm that clusters data from multiple sources organized in peer-to-peer networks. Our algorithm allows peers to reach an approximation of the global model without sharing any raw data. Most importantly, we address the challenge of decentralized clustering with skewed non-IID data and asynchronous computations by integrating HyperLogLog counters with k-means algorithm. Furthermore, our clustering algorithm allows nodes to individually determine the number of clusters that fits their local data. Results using synthetic and real-world datasets show that our algorithm outperforms state-of-the-art decentralized k-means algorithms achieving accuracy gain that is up-to 36%. © Springer Nature Switzerland AG 2020.
Domain Adaptation (DA) aims to transfer knowledge from a source to a target domain by aligning their respective data distributions. In the unsupervised setting, however, this may cause the source and target samples of different classes to align to each other, consequently leading to negative transfer. Semi-Supervised Domain Adaptation (SSDA) tries to solve such class misalignment problem by exploiting a few sample labels in the target domain. This paper proposes a new SSDA method called Adversarial Contrastive Semi-Supervised Domain Adaptation (ACSSDA) which combines two objectives, optimized for the case where very few target sample labels are available, to learn a shared feature representation for both source and target domains. ACSSDA uses a domain classifier to ensure that the resulting feature space is domain agnostic. Simultaneously, Contrastive loss aims to pull together samples of the same class and push apart samples of different classes. This is shown to reduce class misalignment and negative transfer even with as little as a single labeled sample per class. We demonstrate the effectiveness of ACSSDA with experiments on several benchmark data sets. The results show the superiority of our method over state-of-the-art approaches.
Domain adaptation (DA) methods facilitate cross-domain learning by minimizing the marginal or conditional distribution shift between domains. However, the conditional distribution shift is not well addressed by existing DA techniques for the cross-domain regression learning task. In this paper, we propose Multi-Domain Adaptation for Regression under Conditional shift (DARC) method. DARC constructs a shared feature space such that linear regression on top of that space generalizes to all domains. In other words, DARC aligns different domains and makes explicit the task-related information encoded in the values of the dependent variable. It is achieved using a novel Pairwise Similarity Preserver (PSP) loss function. PSP incentivizes the differences between the outcomes of any two samples, regardless of their domain(s), to match the distance between these samples in the constructed space.
We perform experiments in both two-domain and multi-domain settings. The two-domain setting is helpful, especially when one domain contains few available labeled samples and can benefit from adaptation to a domain with many labeled samples. The multi-domain setting allows several domains, each with limited data, to be adapted collectively; thus, multiple domains compensate for each other’s lack of data. The results from all the experiments conducted both on synthetic and real-world datasets confirm the effectiveness of DARC. © 2023 The Authors
In most industries, the working conditions of equipment vary significantly from one site to another, from one time of a year to another, and so on. This variation poses a severe challenge for data-driven fault identification methods: it introduces a change in the data distribution. This contradicts the underlying assumption of most machine learning methods, namely that training and test samples follow the same distribution. Domain Adaptation (DA) methods aim to address this problem by minimizing the distribution distance between training (source) and test (target) samples.
However, in the area of predictive maintenance, this idea is complicated by the fact that different classes – fault categories – also vary across domains. Most of the state-of-the-art DA methods assume that the data in the target domain is complete, i.e., that we have access to examples from all the possible classes or faulty categories during adaptation. In reality, this is often very difficult to guarantee.
Therefore, there is a need for a domain adaptation method that is able to align the source and target domains even in cases of having access to an incomplete set of test data. This paper presents our work in progress as we propose an approach for such a setting based on maintaining the geometry information of source samples during the adaptation. This way, the model can capture the relationships between different fault categories and preserve them in the constructed domain-invariant feature space, even in situations where some classes are entirely missing. This paper examines this idea using artificial data sets to demonstrate the effectiveness of geometry-preserving transformation. We have also started investigations on real-world predictive maintenance datasets, such as CWRU.
In the field of Human Activity Recognition (HAR), the rapid evolution of wearable devices necessitates models that are generalizable and can adapt to entirely new subjects and activities with very limited labeled data. Conventional deep learning models, constrained by their reliance on large training datasets and limited adaptability to novel scenarios, face challenges in these settings. This paper introduces a novel few-shot HAR strategy employing meta-learning, which facilitates rapid adaptation to unseen subjects and activities using minimal annotated samples. Our approach augments time series data with a range of transformations, each assigned a learnable weight, enabling the model to prioritize the most effective augmentations and discard the irrelevant ones. Throughout the meta-training phase, the model learns to identify an optimal weighted combination of these transformations, significantly improving the model's adaptability and generalization to new situations with scarce labeled data. During meta-testing, this knowledge enables the model to efficiently learn from and adapt to a very limited set of labeled samples from completely new subjects undertaking entirely new activities. Extensive experiments on various HAR datasets demonstrate our method's enhanced adaptability and generalization to tasks never encountered during training, affirming its potential for real-world applications characterized by limited data availability.
The standard machine learning assumption that training and test data are drawn from the same probability distribution does not hold in many real-world applications due to the inability to reproduce testing conditions at training time. Existing unsupervised domain adaption (UDA) methods address this problem by learning a domain-invariant feature space that performs well on available source domain(s) (labeled training data) and the specific target domain (unlabeled test data). In contrast, instead of simply adapting to domains, this paper aims for an approach that learns to adapt effectively to new unlabeled domains. To do so, we leverage meta-learning to optimize a neural network such that an unlabeled adaptation of its parameters to any domain would yield a good generalization on this latter. The experimental evaluation shows that the proposed approach outperforms standard approaches even when a small amount of unlabeled test data is used for adaptation, demonstrating the benefit of meta-learning prior knowledge from various domains to solve UDA problems.
Meta-learning or learning to learn involves training a model on various learning tasks in a way that allows it to quickly learn new tasks from the same distribution using only a small amount of training data (i.e., few-shot learning). Current meta-learning methods implicitly assume that the distribution over tasks is unimodal and consists of tasks belonging to a common domain, which significantly reduces the variety of task distributions they can handle. However, in real-world applications, tasks are often very diverse and come from multiple different domains, making it challenging to meta-learn common knowledge shared across the entire task distribution. In this paper, we propose a method for meta-learning from a multimodal task distribution. The proposed method learns multiple sets of meta-parameters (acting as different initializations of a neural network model) and uses a task encoder to select the best initialization to fine-tune for a new task. More specifically, with a few training examples from a task sampled from an unknown mode, the proposed method predicts which set of meta-parameters (i.e., model’s initialization) would lead to a fast adaptation and a good post-adaptation performance on that task. We evaluate the proposed method on a diverse set of few-shot regression and image classification tasks. The results demonstrate the superiority of the proposed method compared to other state of-the-art meta-learning methods and the benefit of learning multiple model initializations when tasks are sampled from a multimodal task distribution. © 2023 IEEE.
Few-shot meta-learning involves training a model on multiple tasks to enable it to efficiently adapt to new, previously unseen tasks with only a limited number of samples. However, current meta-learning methods assume that all tasks are closely related and belong to a common domain, whereas in practice, tasks can be highly diverse and originate from multiple domains, resulting in a multimodal task distribution. This poses a challenge for existing methods as they struggle to learn a shared representation that can be easily adapted to all tasks within the distribution. To address this challenge, we propose a meta-learning framework that can handle multimodal task distributions by conditioning the model on the current task, resulting in a faster adaptation. Our proposed method learns to encode each task and generate task embeddings that modulate the model’s activations. The resulting modulated model becomes specialized for the current task and leads to more effective adaptation. Our framework is designed to work in a realistic setting where the mode from which a task is sampled is unknown. Nonetheless, we also explore the possibility of incorporating auxiliary information, such as the task-mode-label, to further enhance the performance of our method if such information is available. We evaluate our proposed framework on various few-shot regression and image classification tasks, demonstrating its superiority over other state-of-the-art meta-learning methods. The results highlight the benefits of learning to embed task-specific information in the model to guide the adaptation when tasks are sampled from a multimodal distribution. © The Author(s) 2024.
Federated learning has emerged as a promising approach for training machine learning models on decentralized data sources while preserving data privacy. However, challenges such as communication bottlenecks, heterogeneity of client devices, and non-i.i.d. data distribution pose significant obstacles to achieving optimal model performance. We propose a novel framework that combines federated learning with meta-learning techniques to enhance both efficiency and generalization capabilities. Our approach introduces a federated modulator that learns contextual information from data batches and uses this knowledge to generate modulation parameters. These parameters dynamically adjust the activations of a base model, which operates using a MAML-based approach for model personalization. Experimental results across diverse datasets highlight the improvements in convergence speed and model performance compared to existing federated learning approaches. These findings highlight the potential of incorporating contextual information and meta-learning techniques into federated learning, paving the way for advancements in distributed machine learning paradigms. Copyright © 2024 by SIAM.
Meta-learning empowers learning systems with the ability to acquire knowledge from multiple tasks, enabling faster adaptation and generalization to new tasks. This review provides a comprehensive technical overview of meta-learning, emphasizing its importance in real-world applications where data may be scarce or expensive to obtain. The paper covers the state-of-the-art meta-learning approaches and explores the relationship between meta-learning and multi-task learning, transfer learning, domain adaptation and generalization, selfsupervised learning, personalized federated learning, and continual learning. By highlighting the synergies between these topics and the field of meta-learning, the paper demonstrates how advancements in one area can benefit the field as a whole, while avoiding unnecessary duplication of efforts. Additionally, the paper delves into advanced meta-learning topics such as learning from complex multi-modal task distributions, unsupervised metalearning, learning to efficiently adapt to data distribution shifts, and continual meta-learning. Lastly, the paper highlights open problems and challenges for future research in the field. By synthesizing the latest research developments, this paper provides a thorough understanding of meta-learning and its potential impact on various machine learning applications. We believe that this technical overview will contribute to the advancement of meta-learning and its practical implications in addressing realworld problems.
Continual learning (CL) refers to the ability to continually learn over time by accommodating new knowledge while retaining previously learned experience. While this concept is inherent in human learning, current machine learning methods are highly prone to overwrite previously learned patterns and thus forget past experience. Instead, model parameters should be updated selectively and carefully, avoiding unnecessary forgetting while optimally leveraging previously learned patterns to accelerate future learning. Since hand-crafting effective update mechanisms is difficult, we propose meta-learning a transformer-based optimizer to enhance CL. This meta-learned optimizer uses attention to learn the complex relationships between model parameters across a stream of tasks, and is designed to generate effective weight updates for the current task while preventing catastrophic forgetting on previously encountered tasks. Evaluations on benchmark datasets like SplitMNIST, RotatedMNIST, and SplitCIFAR-100 affirm the efficacy of the proposed approach in terms of both forward and backward transfer, even on small sets of labeled data, highlighting the advantages of integrating a meta-learned optimizer within the continual learning framework.