hh.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Utilizing Heterogeneity in Manycore Architectures for Streaming Applications
Högskolan i Halmstad, Akademin för informationsteknologi, Halmstad Embedded and Intelligent Systems Research (EIS), Centrum för forskning om inbyggda system (CERES). (Computer Architectures and Languages)ORCID-id: 0000-0001-8652-0098
2017 (Engelska)Licentiatavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

In the last decade, we have seen a transition from single-core to manycore in computer architectures due to performance requirements and limitations in power consumption and heat dissipation. The first manycores had homogeneous architectures consisting of a few identical cores. However, the applications, which are executed on these architectures, usually consist of several tasks requiring different hardware resources to be executed efficiently. Therefore, we believe that utilizing heterogeneity in manycores will increase the efficiency of the architectures in terms of performance and power consumption. However, development of heterogeneous architectures is more challenging and the transition from homogeneous to heterogeneous architectures will increase the difficulty of efficient software development due to the increased complexity of the architecture. In order to increase the efficiency of hardware and software development, new hardware design methods and software development tools are required. Additionally, there is a lack of knowledge on the performance of applications when executed on manycore architectures.

The transition began with a shift from single-core architectures to homogeneous multicore architectures consisting of a few identical cores. It now continues with a shift from homogeneous architectures with identical cores to heterogeneous architectures with different types of cores specialized for different purposes. However, this transition has increased the complexity of architectures and hence the complexity of software development and execution. In order to decrease the complexity of software development, new software tools are required. Additionally, there is a lack of knowledge on what kind of heterogeneous manycore design is most efficient for different applications and what are the performances of these applications when executed on current commercial manycores.

This thesis studies manycore architectures in order to reveal possible uses of heterogeneity in manycores and facilitate choice of architecture for software and hardware developers. It defines a taxonomy for manycore architectures that is based on the levels of heterogeneity they contain and discusses benefits and drawbacks of these levels. Additionally, it evaluates several applications, a dataflow language (CAL), a source-to-source compilation framework (Cal2Many), and a commercial manycore architecture (Epiphany). The compilation framework takes implementations written in the dataflow language as input and generates code targetting different manycore platforms. Based on these evaluations, the thesis identifies the bottlenecks of the architecture. It finally presents a methodology for developing heterogeneoeus manycore architectures which target specific application domains.

Our studies show that using different types of cores in manycore architectures has the potential to increase the performance of streaming applications. If we add specialized hardware blocks to a core, the performance easily increases by 15x for the target application while the core size increases by 40-50% which can be optimized further. Other results prove that dataflow languages, together with software development tools, decrease software development efforts significantly (25-50%) while having a small impact (2-17%) on the performance.

Ort, förlag, år, upplaga, sidor
Halmstad: Halmstad University Press, 2017. , s. 78
Serie
Halmstad University Dissertations ; 29
Nyckelord [en]
Manycores, parallel architectures, parallelism, streaming applications, dataflow, manycore design, heterogeneous manycores
Nationell ämneskategori
Datorsystem
Identifikatorer
URN: urn:nbn:se:hh:diva-33792ISBN: 978-91-87045-60-8 (tryckt)ISBN: 978-91-87045-61-5 (digital)OAI: oai:DiVA.org:hh-33792DiVA, id: diva2:1093334
Presentation
2017-06-02, Wigforss, Kristian IV:s väg 3, Halmstad, 13:15 (Engelska)
Opponent
Handledare
Projekt
HiPEC (High Performance Embedded Computing)NGES (Towards Next Generation Embedded Systems: Utilizing Parallelism and Reconfigurability)
Forskningsfinansiär
VINNOVAStiftelsen för strategisk forskning (SSF)Tillgänglig från: 2017-05-09 Skapad: 2017-05-05 Senast uppdaterad: 2017-05-09Bibliografiskt granskad
Delarbeten
1. An Evaluation of Code Generation of Dataflow Languages on Manycore Architectures
Öppna denna publikation i ny flik eller fönster >>An Evaluation of Code Generation of Dataflow Languages on Manycore Architectures
Visa övriga...
2014 (Engelska)Ingår i: RTCSA 2014: 2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications, Piscataway, NJ: IEEE Press, 2014, artikel-id 6910501Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Today computer architectures are shifting from single core to manycores due to several reasons such as performance demands, power and heat limitations. However, shifting to manycores results in additional complexities, especially with regard to efficient development of applications. Hence there is a need to raise the abstraction level of development techniques for the manycores while exposing the inherent parallelism in the applications. One promising class of programming languages is dataflow languages and in this paper we evaluate and optimize the code generation for one such language, CAL. We have also developed a communication library to support the inter-core communication.The code generation can target multiple architectures, but the results presented in this paper is focused on Adapteva's many core architecture Epiphany.We use the two-dimensional inverse discrete cosine transform (2D-IDCT) as our benchmark and compare our code generation from CAL with a hand-written implementation developed in C. Several optimizations in the code generation as well as in the communication library are described, and we have observed that the most critical optimization is reducing the number of external memory accesses. Combining all optimizations we have been able to reduce the difference in execution time between auto-generated and hand-written implementations from a factor of 4.3x down to a factor of only 1.3x. ©2014 IEEE.

Ort, förlag, år, upplaga, sidor
Piscataway, NJ: IEEE Press, 2014
Nyckelord
Manycore, Dataflow Languages, code generation, Actor Machine, 2D-IDCT, Epiphany, evaluation
Nationell ämneskategori
Inbäddad systemteknik
Identifikatorer
urn:nbn:se:hh:diva-25649 (URN)10.1109/RTCSA.2014.6910501 (DOI)000352610400005 ()2-s2.0-84908637354 (Scopus ID)
Konferens
RTCSA 2014, 20th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, Chongqing, China, August 20-22, 2014
Projekt
HiPEC project
Forskningsfinansiär
KK-stiftelsenStiftelsen för strategisk forskning (SSF)
Anmärkning

The authors would like to thank Adapteva Inc. for giving access to their software development suite and hardware board. This research is part of the CERES research program funded by the Knowledge Foundation and HiPEC project funded by Swedish Foundation for Strategic Research (SSF).

Tillgänglig från: 2014-06-16 Skapad: 2014-06-16 Senast uppdaterad: 2019-05-07Bibliografiskt granskad
2. Dataflow Implementation of QR Decomposition on a Manycore
Öppna denna publikation i ny flik eller fönster >>Dataflow Implementation of QR Decomposition on a Manycore
Visa övriga...
2016 (Engelska)Ingår i: MES '16: Proceedings of the Third ACM International Workshop on Many-core Embedded Systems, New York, NY: ACM Press, 2016, s. 26-30Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

While parallel computer architectures have become mainstream, application development on them is still challenging. There is a need for new tools, languages and programming models. Additionally, there is a lack of knowledge about the performance of parallel approaches of basic but important operations, such as the QR decomposition of a matrix, on current commercial manycore architectures.

This paper evaluates a high level dataflow language (CAL), a source-to-source compiler (Cal2Many) and three QR decomposition algorithms (Givens Rotations, Householder and Gram-Schmidt). The algorithms are implemented both in CAL and hand-optimized C languages, executed on Adapteva's Epiphany manycore architecture and evaluated with respect to performance, scalability and development effort.

The performance of the CAL (generated C) implementations gets as good as 2\% slower than the hand-written versions. They require an average of 25\% fewer lines of source code without significantly increasing the binary size. Development effort is reduced and debugging is significantly simplified. The implementations executed on Epiphany cores outperform the GNU scientific library on the host ARM processor of the Parallella board by up to 30x. © 2016 Copyright held by the owner/author(s).

Ort, förlag, år, upplaga, sidor
New York, NY: ACM Press, 2016
Nationell ämneskategori
Inbäddad systemteknik
Identifikatorer
urn:nbn:se:hh:diva-32371 (URN)10.1145/2934495.2934499 (DOI)2-s2.0-84991106778 (Scopus ID)978-1-4503-4262-9 (ISBN)
Konferens
MES '16, International Workshop on Many-core Embedded Systems, Seoul, Republic of Korea, June 19, 2016
Projekt
ESCHERHiPEC
Forskningsfinansiär
KK-stiftelsenStiftelsen för strategisk forskning (SSF)ELLIIT - The Linköping‐Lund Initiative on IT and Mobile Communications
Tillgänglig från: 2016-11-04 Skapad: 2016-11-04 Senast uppdaterad: 2019-05-07Bibliografiskt granskad
3. Efficient Single-Precision Floating-Point Division Using Harmonized Parabolic Synthesis
Öppna denna publikation i ny flik eller fönster >>Efficient Single-Precision Floating-Point Division Using Harmonized Parabolic Synthesis
2017 (Engelska)Ingår i: 2017 IEEE Computer Society Annual Symposium on VLSI: ISVLSI 2017 / [ed] Michael Hübner, Ricardo Reis, Mircea Stan & Nikolaos Voros, Los Alamitos: IEEE, 2017Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

This paper proposes a novel method for performing division on floating-point numbers represented in IEEE-754 single-precision (binary32) format. The method is based on an inverter, implemented as a combination of Parabolic Synthesis and second-degree interpolation, followed by a multiplier. It is implemented with and without pipeline stages individually and synthesized while targeting a Xilinx Ultrascale FPGA.

The implementations show better resource usage and latency results when compared to other implementations based on different methods. In case of throughput, the proposed method outperforms most of the other works, however, some Altera FPGAs achieve higher clock rate due to the differences in the DSP slice multiplier design.

Due to the small size, low latency and high throughput, the presented floating-point division unit is suitable for high performance embedded systems and can be integrated into accelerators or be used as a stand-alone accelerator.

Ort, förlag, år, upplaga, sidor
Los Alamitos: IEEE, 2017
Serie
IEEE Computer Society Annual Symposium on VLSI, ISSN 2159-3477
Nyckelord
Floating-point, single precision, division, FPGA, Harmonized Parabolic Synthesis
Nationell ämneskategori
Datorsystem
Identifikatorer
urn:nbn:se:hh:diva-33793 (URN)10.1109/ISVLSI.2017.28 (DOI)2-s2.0-85027258772 (Scopus ID)978-1-5090-6762-6 (ISBN)978-1-5090-6763-3 (ISBN)
Konferens
IEEE Computer Society Annual Symposium on VLSI, July 3-5, 2017, Bochum, Germany
Projekt
NGES
Forskningsfinansiär
VINNOVA
Tillgänglig från: 2017-05-05 Skapad: 2017-05-05 Senast uppdaterad: 2019-05-07Bibliografiskt granskad
4. Designing Domain Specific Heterogeneous Manycore Architectures Based on Building Blocks
Öppna denna publikation i ny flik eller fönster >>Designing Domain Specific Heterogeneous Manycore Architectures Based on Building Blocks
2018 (Engelska)Manuskript (preprint) (Övrigt vetenskapligt)
Abstract [en]

Performance and power requirements has pushed computer architectures from single core to manycores. These requirements now continue pushing the manycores with identical cores (homogeneous) to manycores with specialized cores (heterogeneous). However designing heterogeneous manycores is a challenging task due to the complexity of the architectures. We propose an approach for designing domain specific heterogeneous manycore architectures based on building blocks. These blocks are defined as the common computations of the applications within a domain. The objective is to generate heterogeneous architectures by integrating many of these blocks to many simple cores and connect the cores with a networkon-chip. The proposed approach aims to ease the design of heterogeneous manycore architectures and facilitate usage of dark silicon concept. As a case study, we develop an accelerator based on several building blocks, integrate it to a RISC core and synthesize on a Xilinx Ultrascale FPGA. The results show that executing a hot-spot of an application on an accelerator based on building blocks increases the performance by 15x, with room for further improvement. The area usage increases as well, however there are potential optimizations to reduce the area usage. © 2018 by the authors

Nyckelord
heterogeneous architecture design, risc-v, dataflow, QR decomposition, domain-specific processor, accelerator, Autofocus, hardware software co-design
Nationell ämneskategori
Inbäddad systemteknik
Identifikatorer
urn:nbn:se:hh:diva-33818 (URN)
Projekt
HiPEC (High Performance Embedded Computing)NGES (Towards Next, Generation Embedded Systems: Utilizing Parallelism and Reconfigurability)
Forskningsfinansiär
Stiftelsen för strategisk forskning (SSF)VINNOVA
Tillgänglig från: 2017-05-09 Skapad: 2017-05-09 Senast uppdaterad: 2018-12-05Bibliografiskt granskad

Open Access i DiVA

fulltext(2047 kB)1106 nedladdningar
Filinformation
Filnamn FULLTEXT02.pdfFilstorlek 2047 kBChecksumma SHA-512
fe0d054339b387b4e7421981a10f1d7b411818ce42d18c3f6fbe58f558015f22dc6e6e22e50db7bc641852a25744f720de1498be21517b617f96a1f775bf62af
Typ fulltextMimetyp application/pdf

Personposter BETA

Savas, Süleyman

Sök vidare i DiVA

Av författaren/redaktören
Savas, Süleyman
Av organisationen
Centrum för forskning om inbyggda system (CERES)
Datorsystem

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 1106 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

isbn
urn-nbn

Altmetricpoäng

isbn
urn-nbn
Totalt: 1117 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf