hh.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Streaming Tiles: Flexible Implementation of Convolution Neural Networks Inference on Manycore Architectures
Högskolan i Halmstad, Akademin för informationsteknologi, Halmstad Embedded and Intelligent Systems Research (EIS), Centrum för forskning om inbyggda system (CERES).
Amrita University, Bengaluru, India.
Högskolan i Halmstad, Akademin för informationsteknologi, Halmstad Embedded and Intelligent Systems Research (EIS), Centrum för forskning om inbyggda system (CERES).ORCID-id: 0000-0002-4932-4036
2018 (Engelska)Ingår i: 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Los Alamitos: IEEE Computer Society, 2018, s. 867-876Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Convolution neural networks (CNN) are extensively used for deep learning applications such as image recognition and computer vision. The convolution module of these networks is highly compute-intensive. Having an efficient implementation of the convolution module enables realizing the inference part of the neural network on embedded platforms. Low precision parameters require less memory, less computation time, and less power consumption while achieving high classification accuracy. Furthermore, streaming the data over parallelized processing units saves a considerable amount of memory, which is a key concern in memory constrained embedded platforms. In this paper, we explore the design space for streamed CNN on Epiphany manycore architecture using varying precisions for weights (ranging from binary to 32-bit). Both AlexNet and GoogleNet are explored for two different memory sizes of Epiphany cores. We are able to achieve competitive performance for both Alexnet and GoogleNet with respect to emerging manycores. Furthermore, the effects of different design choices in terms of precision, memory size, and the number of cores are evaluated by applying the proposed method.

Ort, förlag, år, upplaga, sidor
Los Alamitos: IEEE Computer Society, 2018. s. 867-876
Nyckelord [en]
manycores, CNN, stream processing, embedded systems
Nationell ämneskategori
Inbäddad systemteknik
Identifikatorer
URN: urn:nbn:se:hh:diva-36887DOI: 10.1109/IPDPSW.2018.00138Scopus ID: 2-s2.0-85052195969ISBN: 978-1-5386-5555-9 (digital)ISBN: 978-1-5386-5556-6 (tryckt)OAI: oai:DiVA.org:hh-36887DiVA, id: diva2:1212121
Konferens
The 7th International Workshop on Parallel and Distributed Computing for Large Scale Machine Learning and Big Data Analytics, Vancouver, British Columbia, Canada, May 21, 2018
Projekt
NGES (Towards Next Generation Embedded Systems: Utilizing Parallelism and Reconfigurability)
Forskningsfinansiär
Vinnova
Anmärkning

As manuscript in thesis.

Other funding: Department of Science and Technology, Government of India.

Tillgänglig från: 2018-06-01 Skapad: 2018-06-01 Senast uppdaterad: 2020-04-30Bibliografiskt granskad
Ingår i avhandling
1. Exploring Efficient Implementations of Deep Learning Applications on Embedded Platforms
Öppna denna publikation i ny flik eller fönster >>Exploring Efficient Implementations of Deep Learning Applications on Embedded Platforms
2020 (Engelska)Licentiatavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

The promising results of deep learning (deep neural network) models in many applications such as speech recognition and computer vision have aroused a need for their realization on embedded platforms. Augmenting DL (Deep Learning) in embedded platforms grants them the support to intelligent tasks in smart homes, mobile phones, and healthcare applications. Deep learning models rely on intensive operations between high precision values. In contrast, embedded platforms have restricted compute and energy budgets. Thus, it is challenging to realize deep learning models on embedded platforms.

In this thesis, we define the objectives of implementing deep learning models on embedded platforms. The main objective is to achieve efficient implementations. The implementation should achieve high throughput, preserve low power consumption, and meet real-time requirements.The secondary objective is flexibility. It is not enough to propose an efficient hardware solution for one model. The proposed solution should be flexible to support changes in the model and the application constraints. Thus, the overarching goal of the thesis is to explore flexible methods for efficient realization of deep learning models on embedded platforms.

Optimizations are applied to both the DL model and the embedded platform to increase implementation efficiency. To understand the impact of different optimizations, we chose recurrent neural networks (as a class of DL models) and compared its' implementations on embedded platforms. The comparison analyzes the optimizations applied and the corresponding performance to provide conclusions on the most fruitful and essential optimizations. We concluded that it is essential to apply an algorithmic optimization to the model to decrease it's compute and memory requirement, and it is essential to apply a memory-specific optimization to hide the overhead of memory access to achieve high efficiency. Furthermore, it has been revealed that many of the work understudy focus on implementation efficiency, and flexibility is less attempted.

We have explored the design space of Convolutional neural networks (CNNs) on Epiphany manycore architecture. We adopted a pipeline implementation of CNN that relies on the on-chip memory solely to store the weights. Also, the proposed mapping supported both ALexNet and GoogleNet CNN models, varying precision for weights, and two memory sizes for Epiphany cores. We were able to achieve competitive performance with respect to emerging manycores.

As a part of the work in progress, we have studied a DL-architecture co-design approach to increase the flexibility of hardware solutions. A flexible platform should support variations in the model and variations in optimizations. The optimization method should be automated to respond to the changes in the model and application constraints with minor effort. Besides, the mapping of the models on embedded platforms should be automated as well.

Ort, förlag, år, upplaga, sidor
Halmstad: Halmstad University Press, 2020. s. 81
Serie
Halmstad University Dissertations ; 71
Nationell ämneskategori
Inbäddad systemteknik
Identifikatorer
urn:nbn:se:hh:diva-41969 (URN)978-91-88749-51-2 (ISBN)978-91-88749-50-5 (ISBN)
Presentation
2020-06-04, Wigforss, Visionen, Halmstad University, Kristian IV:s väg 3, Halmstad, 10:00 (Engelska)
Opponent
Handledare
Tillgänglig från: 2020-05-14 Skapad: 2020-04-27 Senast uppdaterad: 2020-05-14Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Personposter BETA

Rezk, NesmaUl-Abdin, Zain

Sök vidare i DiVA

Av författaren/redaktören
Rezk, NesmaUl-Abdin, Zain
Av organisationen
Centrum för forskning om inbyggda system (CERES)
Inbäddad systemteknik

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetricpoäng

doi
isbn
urn-nbn
Totalt: 181 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf