hh.sePublications
Planned maintenance
A system upgrade is planned for 24/9-2024, at 12:00-14:00. During this time DiVA will be unavailable.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Recurrent Neural Networks: An Embedded Computing Perspective
Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).ORCID iD: 0000-0002-4674-3809
Amrita Vishwa Vidyapeetham, Bengaluru, India.ORCID iD: 0000-0003-4995-6233
Department of Applied Physics and Electronics (TFE), Umeå University, Umeå, Sweden.ORCID iD: 0000-0002-0562-2082
Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).ORCID iD: 0000-0002-4932-4036
2020 (English)In: IEEE Access, E-ISSN 2169-3536, Vol. 8, p. 57967-57996Article in journal (Refereed) Published
Abstract [en]

Recurrent Neural Networks (RNNs) are a class of machine learning algorithms used for applications with time-series and sequential data. Recently, there has been a strong interest in executing RNNs on embedded devices. However, difficulties have arisen because RNN requires high computational capability and a large memory space. In this paper, we review existing implementations of RNN models on embedded platforms and discuss the methods adopted to overcome the limitations of embedded systems. We will define the objectives of mapping RNN algorithms on embedded platforms and the challenges facing their realization. Then, we explain the components of RNN models from an implementation perspective. We also discuss the optimizations applied to RNNs to run efficiently on embedded platforms. Finally, we compare the defined objectives with the implementations and highlight some open research questions and aspects currently not addressed for embedded RNNs. Overall, applying algorithmic optimizations to RNN models and decreasing the memory access overhead is vital to obtain high efficiency. To further increase the implementation efficiency, we point up the more promising optimizations that could be applied in future research. Additionally, this article observes that high performance has been targeted by many implementations, while flexibility has, as yet, been attempted less often. Thus, the article provides some guidelines for RNN hardware designers to support flexibility in a better manner. © 2020 IEEE.

Place, publisher, year, edition, pages
Piscataway: IEEE, 2020. Vol. 8, p. 57967-57996
Keywords [en]
Compression, flexibility, efficiency, embedded computing, long short term memory (LSTM), quantization, recurrent neural networks (RNNs)
National Category
Computer Systems
Identifiers
URN: urn:nbn:se:hh:diva-41981DOI: 10.1109/ACCESS.2020.2982416ISI: 000527411700168Scopus ID: 2-s2.0-85082939909OAI: oai:DiVA.org:hh-41981DiVA, id: diva2:1427602
Projects
NGES (Towards Next Generation Embedded Systems: Utilizing Parallelism and Reconfigurability)
Funder
Vinnova, INT/SWD/VINN/p-10/2015
Note

As manuscript in thesis.

Other funding: Government of India

Available from: 2020-04-30 Created: 2020-04-30 Last updated: 2022-11-23Bibliographically approved
In thesis
1. Exploring Efficient Implementations of Deep Learning Applications on Embedded Platforms
Open this publication in new window or tab >>Exploring Efficient Implementations of Deep Learning Applications on Embedded Platforms
2020 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

The promising results of deep learning (deep neural network) models in many applications such as speech recognition and computer vision have aroused a need for their realization on embedded platforms. Augmenting DL (Deep Learning) in embedded platforms grants them the support to intelligent tasks in smart homes, mobile phones, and healthcare applications. Deep learning models rely on intensive operations between high precision values. In contrast, embedded platforms have restricted compute and energy budgets. Thus, it is challenging to realize deep learning models on embedded platforms.

In this thesis, we define the objectives of implementing deep learning models on embedded platforms. The main objective is to achieve efficient implementations. The implementation should achieve high throughput, preserve low power consumption, and meet real-time requirements.The secondary objective is flexibility. It is not enough to propose an efficient hardware solution for one model. The proposed solution should be flexible to support changes in the model and the application constraints. Thus, the overarching goal of the thesis is to explore flexible methods for efficient realization of deep learning models on embedded platforms.

Optimizations are applied to both the DL model and the embedded platform to increase implementation efficiency. To understand the impact of different optimizations, we chose recurrent neural networks (as a class of DL models) and compared its' implementations on embedded platforms. The comparison analyzes the optimizations applied and the corresponding performance to provide conclusions on the most fruitful and essential optimizations. We concluded that it is essential to apply an algorithmic optimization to the model to decrease it's compute and memory requirement, and it is essential to apply a memory-specific optimization to hide the overhead of memory access to achieve high efficiency. Furthermore, it has been revealed that many of the work understudy focus on implementation efficiency, and flexibility is less attempted.

We have explored the design space of Convolutional neural networks (CNNs) on Epiphany manycore architecture. We adopted a pipeline implementation of CNN that relies on the on-chip memory solely to store the weights. Also, the proposed mapping supported both ALexNet and GoogleNet CNN models, varying precision for weights, and two memory sizes for Epiphany cores. We were able to achieve competitive performance with respect to emerging manycores.

As a part of the work in progress, we have studied a DL-architecture co-design approach to increase the flexibility of hardware solutions. A flexible platform should support variations in the model and variations in optimizations. The optimization method should be automated to respond to the changes in the model and application constraints with minor effort. Besides, the mapping of the models on embedded platforms should be automated as well.

Place, publisher, year, edition, pages
Halmstad: Halmstad University Press, 2020. p. 81
Series
Halmstad University Dissertations ; 71
National Category
Embedded Systems
Identifiers
urn:nbn:se:hh:diva-41969 (URN)978-91-88749-51-2 (ISBN)978-91-88749-50-5 (ISBN)
Presentation
2020-06-04, Wigforss, Visionen, Halmstad University, Kristian IV:s väg 3, Halmstad, 10:00 (English)
Opponent
Supervisors
Available from: 2020-05-14 Created: 2020-04-27 Last updated: 2020-05-14Bibliographically approved
2. Deep Learning on the Edge: A Flexible Multi-level Optimization Approach
Open this publication in new window or tab >>Deep Learning on the Edge: A Flexible Multi-level Optimization Approach
2022 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Recent advances in Deep Learning (DL) research have been adopted in a wide variety of applications, including autonomous driving, AI in health care, and smart homes. In parallel, research in high-performance embedded computing has resulted in advanced hardware platforms that offer enhanced performance and energy efficiency for demanding computations. However, the high demands of DL models for computational and memory resources are still a challenge for embedded computing. Algorithmic optimizations can be used to reduce the computational and memory requirements of DL models. Hardware implementations and architectures can also be tuned to support DL applications’ requirements. This thesis identifies that insufficient coordination between hardware implementations and models’ optimizations limits the efficiency of the resulting implementations. In addition, the implementation methods themselves suffer from poor flexibility in adapting to changes in the model and application constraints. The overarching theme of this thesis is to study and propose methods for the efficient and flexible implementation of DL models on embedded platforms. The work in this thesis bridges the gap between DL models’ algorithmic optimizations and embedded platforms’ hardware-specific optimizations, and investigates the features that need support from DL domain-specific architectures. In addition, a method for multi-objective quantization of DL models is proposed to address both the model error and platform performance metrics. Post-training optimization techniques are employed to facilitate the multiobjective optimization of the models because they do not require retraining after model optimization. This thesis also reviews the optimization methods that are known to have been applied to improve the implementation efficiency of DL models. It highlights the most fruitful optimizations found in existing, highly efficient implementations, and applies them in the proposed methods. A method for mapping Convolution Neural Networks (CNN) on Epiphany, a manycore architecture, is proposed and evaluated. A method for quantization and approximation for RNN models in a post-training fashion is also proposed, and evaluated on four RNN models. The proposed quantization method is used in a hardware-aware multi-objective optimization for RNN models to be deployed on SiLago and Bit fusion architectures.

Place, publisher, year, edition, pages
Halmstad: Halmstad University Press, 2022. p. 65
Series
Halmstad University Dissertations ; 95
National Category
Embedded Systems
Identifiers
urn:nbn:se:hh:diva-48680 (URN)978-91-89587-03-8 (ISBN)978-91-89587-02-1 (ISBN)
Public defence
2022-12-15, Halda, Visionen, Kristian IV:s, väg 3, Halmstad, 11:58 (English)
Opponent
Supervisors
Note

Kompletteras med LibrisID när tillgängligt.

Available from: 2022-11-24 Created: 2022-11-23 Last updated: 2022-11-24Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Rezk, NesmaUl-Abdin, Zain

Search in DiVA

By author/editor
Rezk, NesmaPurnaprajna, MadhuraNordström, TomasUl-Abdin, Zain
By organisation
Centre for Research on Embedded Systems (CERES)
In the same journal
IEEE Access
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 98 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf