hh.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
ModelFlex: Parameter Tuning for Flexible Design of Deep Learning Accelerators
Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).ORCID iD: 0000-0002-4674-3809
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Algorithmic optimizations are applied to neural networks models to decrease their compute and memory requirements for efficient realization on embedded platforms. A feedback form the target platform during the optimization process can increase the benefit of these optimizations. In this paper, we propose a method for hardware guided optimizations to recurrent neural networks. The method is automated to respond to changes in the model or the application constraints with minimal effort. Also, a hybrid of three optimizations is applied to the base RNN model to increase the search space for a feasible solution and increase the chance of skipping retraining.

National Category
Embedded Systems
Identifiers
URN: urn:nbn:se:hh:diva-41998OAI: oai:DiVA.org:hh-41998DiVA, id: diva2:1428175
Note

As manuscript in thesis

Available from: 2020-05-05 Created: 2020-05-05 Last updated: 2020-05-05
In thesis
1. Exploring Efficient Implementations of Deep Learning Applications on Embedded Platforms
Open this publication in new window or tab >>Exploring Efficient Implementations of Deep Learning Applications on Embedded Platforms
2020 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

The promising results of deep learning (deep neural network) models in many applications such as speech recognition and computer vision have aroused a need for their realization on embedded platforms. Augmenting DL (Deep Learning) in embedded platforms grants them the support to intelligent tasks in smart homes, mobile phones, and healthcare applications. Deep learning models rely on intensive operations between high precision values. In contrast, embedded platforms have restricted compute and energy budgets. Thus, it is challenging to realize deep learning models on embedded platforms.

In this thesis, we define the objectives of implementing deep learning models on embedded platforms. The main objective is to achieve efficient implementations. The implementation should achieve high throughput, preserve low power consumption, and meet real-time requirements.The secondary objective is flexibility. It is not enough to propose an efficient hardware solution for one model. The proposed solution should be flexible to support changes in the model and the application constraints. Thus, the overarching goal of the thesis is to explore flexible methods for efficient realization of deep learning models on embedded platforms.

Optimizations are applied to both the DL model and the embedded platform to increase implementation efficiency. To understand the impact of different optimizations, we chose recurrent neural networks (as a class of DL models) and compared its' implementations on embedded platforms. The comparison analyzes the optimizations applied and the corresponding performance to provide conclusions on the most fruitful and essential optimizations. We concluded that it is essential to apply an algorithmic optimization to the model to decrease it's compute and memory requirement, and it is essential to apply a memory-specific optimization to hide the overhead of memory access to achieve high efficiency. Furthermore, it has been revealed that many of the work understudy focus on implementation efficiency, and flexibility is less attempted.

We have explored the design space of Convolutional neural networks (CNNs) on Epiphany manycore architecture. We adopted a pipeline implementation of CNN that relies on the on-chip memory solely to store the weights. Also, the proposed mapping supported both ALexNet and GoogleNet CNN models, varying precision for weights, and two memory sizes for Epiphany cores. We were able to achieve competitive performance with respect to emerging manycores.

As a part of the work in progress, we have studied a DL-architecture co-design approach to increase the flexibility of hardware solutions. A flexible platform should support variations in the model and variations in optimizations. The optimization method should be automated to respond to the changes in the model and application constraints with minor effort. Besides, the mapping of the models on embedded platforms should be automated as well.

Place, publisher, year, edition, pages
Halmstad: Halmstad University Press, 2020. p. 81
Series
Halmstad University Dissertations ; 71
National Category
Embedded Systems
Identifiers
urn:nbn:se:hh:diva-41969 (URN)978-91-88749-51-2 (ISBN)978-91-88749-50-5 (ISBN)
Presentation
2020-06-04, Wigforss, Visionen, Halmstad University, Kristian IV:s väg 3, Halmstad, 10:00 (English)
Opponent
Supervisors
Available from: 2020-05-14 Created: 2020-04-27 Last updated: 2020-05-14Bibliographically approved

Open Access in DiVA

No full text in DiVA

Authority records BETA

Rezk, Nesma

Search in DiVA

By author/editor
Rezk, Nesma
By organisation
Centre for Research on Embedded Systems (CERES)
Embedded Systems

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 6 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf