hh.sePublications
Change search
Refine search result
1 - 8 of 8
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Rezk, Nesma
    Halmstad University, School of Information Technology.
    Deep Learning on the Edge: A Flexible Multi-level Optimization Approach2022Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Recent advances in Deep Learning (DL) research have been adopted in a wide variety of applications, including autonomous driving, AI in health care, and smart homes. In parallel, research in high-performance embedded computing has resulted in advanced hardware platforms that offer enhanced performance and energy efficiency for demanding computations. However, the high demands of DL models for computational and memory resources are still a challenge for embedded computing. Algorithmic optimizations can be used to reduce the computational and memory requirements of DL models. Hardware implementations and architectures can also be tuned to support DL applications’ requirements. This thesis identifies that insufficient coordination between hardware implementations and models’ optimizations limits the efficiency of the resulting implementations. In addition, the implementation methods themselves suffer from poor flexibility in adapting to changes in the model and application constraints. The overarching theme of this thesis is to study and propose methods for the efficient and flexible implementation of DL models on embedded platforms. The work in this thesis bridges the gap between DL models’ algorithmic optimizations and embedded platforms’ hardware-specific optimizations, and investigates the features that need support from DL domain-specific architectures. In addition, a method for multi-objective quantization of DL models is proposed to address both the model error and platform performance metrics. Post-training optimization techniques are employed to facilitate the multiobjective optimization of the models because they do not require retraining after model optimization. This thesis also reviews the optimization methods that are known to have been applied to improve the implementation efficiency of DL models. It highlights the most fruitful optimizations found in existing, highly efficient implementations, and applies them in the proposed methods. A method for mapping Convolution Neural Networks (CNN) on Epiphany, a manycore architecture, is proposed and evaluated. A method for quantization and approximation for RNN models in a post-training fashion is also proposed, and evaluated on four RNN models. The proposed quantization method is used in a hardware-aware multi-objective optimization for RNN models to be deployed on SiLago and Bit fusion architectures.

    Download full text (pdf)
    fulltext
  • 2.
    Rezk, Nesma
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Exploring Efficient Implementations of Deep Learning Applications on Embedded Platforms2020Licentiate thesis, comprehensive summary (Other academic)
    Abstract [en]

    The promising results of deep learning (deep neural network) models in many applications such as speech recognition and computer vision have aroused a need for their realization on embedded platforms. Augmenting DL (Deep Learning) in embedded platforms grants them the support to intelligent tasks in smart homes, mobile phones, and healthcare applications. Deep learning models rely on intensive operations between high precision values. In contrast, embedded platforms have restricted compute and energy budgets. Thus, it is challenging to realize deep learning models on embedded platforms.

    In this thesis, we define the objectives of implementing deep learning models on embedded platforms. The main objective is to achieve efficient implementations. The implementation should achieve high throughput, preserve low power consumption, and meet real-time requirements.The secondary objective is flexibility. It is not enough to propose an efficient hardware solution for one model. The proposed solution should be flexible to support changes in the model and the application constraints. Thus, the overarching goal of the thesis is to explore flexible methods for efficient realization of deep learning models on embedded platforms.

    Optimizations are applied to both the DL model and the embedded platform to increase implementation efficiency. To understand the impact of different optimizations, we chose recurrent neural networks (as a class of DL models) and compared its' implementations on embedded platforms. The comparison analyzes the optimizations applied and the corresponding performance to provide conclusions on the most fruitful and essential optimizations. We concluded that it is essential to apply an algorithmic optimization to the model to decrease it's compute and memory requirement, and it is essential to apply a memory-specific optimization to hide the overhead of memory access to achieve high efficiency. Furthermore, it has been revealed that many of the work understudy focus on implementation efficiency, and flexibility is less attempted.

    We have explored the design space of Convolutional neural networks (CNNs) on Epiphany manycore architecture. We adopted a pipeline implementation of CNN that relies on the on-chip memory solely to store the weights. Also, the proposed mapping supported both ALexNet and GoogleNet CNN models, varying precision for weights, and two memory sizes for Epiphany cores. We were able to achieve competitive performance with respect to emerging manycores.

    As a part of the work in progress, we have studied a DL-architecture co-design approach to increase the flexibility of hardware solutions. A flexible platform should support variations in the model and variations in optimizations. The optimization method should be automated to respond to the changes in the model and application constraints with minor effort. Besides, the mapping of the models on embedded platforms should be automated as well.

    Download full text (pdf)
    fulltext
  • 3.
    Rezk, Nesma
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    ModelFlex: Parameter Tuning for Flexible Design of Deep Learning AcceleratorsManuscript (preprint) (Other academic)
    Abstract [en]

    Algorithmic optimizations are applied to neural networks models to decrease their compute and memory requirements for efficient realization on embedded platforms. A feedback form the target platform during the optimization process can increase the benefit of these optimizations. In this paper, we propose a method for hardware guided optimizations to recurrent neural networks. The method is automated to respond to changes in the model or the application constraints with minimal effort. Also, a hybrid of three optimizations is applied to the base RNN model to increase the search space for a feasible solution and increase the chance of skipping retraining.

  • 4.
    Rezk, Nesma M.
    et al.
    Halmstad University, School of Information Technology.
    Nordström, Tomas
    Umeå University, Umeå, Sweden.
    Ul-Abdin, Zain
    Halmstad University, School of Information Technology.
    Shrink and Eliminate: A Study of Post-Training Quantization and Repeated Operations Elimination in RNN Models2022In: Information, E-ISSN 2078-2489, Vol. 13, no 4, article id 176Article in journal (Refereed)
    Abstract [en]

    Recurrent neural networks (RNNs) are neural networks (NN) designed for time-series applications. There is a growing interest in running RNNs to support these applications on edge devices. However, RNNs have large memory and computational demands that make them challenging to implement on edge devices. Quantization is used to shrink the size and the computational needs of such models by decreasing weights and activation precision. Further, the delta networks method increases the sparsity in activation vectors by relying on the temporal relationship between successive input sequences to eliminate repeated computations and memory accesses. In this paper, we study the effect of quantization on LSTM-, GRU-, LiGRU-, and SRU-based RNN models for speech recognition on the TIMIT dataset. We show how to apply post-training quantization on these models with a minimal increase in the error by skipping quantization of selected paths. In addition, we show that the quantization of activation vectors in RNNs to integer precision leads to considerable sparsity if the delta networks method is applied. Then, we propose a method for increasing the sparsity in the activation vectors while minimizing the error and maximizing the percentage of eliminated computations. The proposed quantization method managed to com-press the four models more than 85%, with an error increase of 0.6, 0, 2.1, and 0.2 percentage points, respectively. By applying the delta networks method to the quantized models, more than 50% of the operations can be eliminated, in most cases with only a minor increase in the error. Comparing the four models to each other under the quantization and delta networks method, we found that compressed LSTM-based models are the most-optimum solutions at low-error-rates constraints. The compressed SRU-based models are the smallest in size, suitable when higher error rates are acceptable, and the compressed LiGRU-based models have the highest number of eliminated operations. © 2022 by the authors. Licensee MDPI, Basel, Switzerland.

  • 5.
    Rezk, Nesma
    et al.
    Halmstad University, School of Information Technology.
    Nordström, Tomas
    Umeå University, Umeå, Sweden.
    Stathis, Dimitrios
    KTH University, Stockholm, Sweden.
    Ul-Abdin, Zain
    Halmstad University, School of Information Technology.
    Aksoy, Eren
    Halmstad University, School of Information Technology.
    Hemani, Ahmed
    KTH University, Stockholm, Sweden.
    MOHAQ: Multi-Objective Hardware-Aware Quantization of recurrent neural networks2022In: Journal of systems architecture, ISSN 1383-7621, E-ISSN 1873-6165, Vol. 133, article id 102778Article in journal (Refereed)
    Abstract [en]

    The compression of deep learning models is of fundamental importance in deploying such models to edge devices. The selection of compression parameters can be automated to meet changes in the hardware platform and application. This article introduces a Multi-Objective Hardware-Aware Quantization (MOHAQ) method, which considers hardware performance and inference error as objectives for mixed-precision quantization. The proposed method feasibly evaluates candidate solutions in a large search space by relying on two steps. First, post-training quantization is applied for fast solution evaluation (inference-only search). Second, we propose the ”beacon-based search” to retrain selected solutions only and use them as beacons to estimate the effect of retraining on other solutions. We use speech recognition models on TIMIT dataset. Experimental evaluations show that Simple Recurrent Unit (SRU)-based models can be compressed up to 8x by post-training quantization without any significant error increase. On SiLago, we found solutions that achieve 97% and 86% of the maximum possible speedup and energy saving, with a minor increase in error on an SRU-based model. On Bitfusion, the beacon-based search reduced the error gain of the inference-only search on SRU-based models and Light Gated Recurrent Unit (LiGRU)-based model by up to 4.9 and 3.9 percentage points, respectively.

  • 6.
    Rezk, Nesma
    et al.
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Purnaprajna, Madhura
    Amrita Vishwa Vidyapeetham, Bengaluru, India.
    Nordström, Tomas
    Department of Applied Physics and Electronics (TFE), Umeå University, Umeå, Sweden.
    Ul-Abdin, Zain
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Recurrent Neural Networks: An Embedded Computing Perspective2020In: IEEE Access, E-ISSN 2169-3536, Vol. 8, p. 57967-57996Article in journal (Refereed)
    Abstract [en]

    Recurrent Neural Networks (RNNs) are a class of machine learning algorithms used for applications with time-series and sequential data. Recently, there has been a strong interest in executing RNNs on embedded devices. However, difficulties have arisen because RNN requires high computational capability and a large memory space. In this paper, we review existing implementations of RNN models on embedded platforms and discuss the methods adopted to overcome the limitations of embedded systems. We will define the objectives of mapping RNN algorithms on embedded platforms and the challenges facing their realization. Then, we explain the components of RNN models from an implementation perspective. We also discuss the optimizations applied to RNNs to run efficiently on embedded platforms. Finally, we compare the defined objectives with the implementations and highlight some open research questions and aspects currently not addressed for embedded RNNs. Overall, applying algorithmic optimizations to RNN models and decreasing the memory access overhead is vital to obtain high efficiency. To further increase the implementation efficiency, we point up the more promising optimizations that could be applied in future research. Additionally, this article observes that high performance has been targeted by many implementations, while flexibility has, as yet, been attempted less often. Thus, the article provides some guidelines for RNN hardware designers to support flexibility in a better manner. © 2020 IEEE.

  • 7.
    Rezk, Nesma
    et al.
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Purnaprajna, Madhura
    Amrita University, Bengaluru, India.
    Ul-Abdin, Zain
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    E€iffcient Implementation of Convolution Neural Networks Inference On Manycore Architectures2017Conference paper (Refereed)
    Abstract [en]

    The convolution module of convolution neural networks is highly computation demanding. In order to execute a neural network inference on embedded platforms, an ecient implementation of the convolution is required. Low precision parameters can provide an implementation that requires less memory, less computation time, and less power consumption. Nevertheless, streaming the convolution computation over parallelized processing units saves a lot of memory, which is a key concern in memory constrained embedded platforms. In this paper, we show how the convolution module can be implemented on Epiphany manycore architecture. Low precision parameters are used with ternary weights of +1, 0, and -1 values. The computation is done through a pipeline by streaming data through processing units. The proposed approach decreases the memory requirements for CNN implementation and could reach up to 282 GOPS and up to 5.6 GOPs/watt.

  • 8.
    Rezk, Nesma
    et al.
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Purnaprajna, Madhura
    Amrita University, Bengaluru, India.
    Ul-Abdin, Zain
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Streaming Tiles: Flexible Implementation of Convolution Neural Networks Inference on Manycore Architectures2018In: 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Los Alamitos: IEEE Computer Society, 2018, p. 867-876Conference paper (Refereed)
    Abstract [en]

    Convolution neural networks (CNN) are extensively used for deep learning applications such as image recognition and computer vision. The convolution module of these networks is highly compute-intensive. Having an efficient implementation of the convolution module enables realizing the inference part of the neural network on embedded platforms. Low precision parameters require less memory, less computation time, and less power consumption while achieving high classification accuracy. Furthermore, streaming the data over parallelized processing units saves a considerable amount of memory, which is a key concern in memory constrained embedded platforms. In this paper, we explore the design space for streamed CNN on Epiphany manycore architecture using varying precisions for weights (ranging from binary to 32-bit). Both AlexNet and GoogleNet are explored for two different memory sizes of Epiphany cores. We are able to achieve competitive performance for both Alexnet and GoogleNet with respect to emerging manycores. Furthermore, the effects of different design choices in terms of precision, memory size, and the number of cores are evaluated by applying the proposed method.

1 - 8 of 8
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf