hh.sePublikasjoner
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Utilizing Heterogeneity in Manycore Architectures for Streaming Applications
Högskolan i Halmstad, Akademin för informationsteknologi, Halmstad Embedded and Intelligent Systems Research (EIS), Centrum för forskning om inbyggda system (CERES). (Computer Architectures and Languages)ORCID-id: 0000-0001-8652-0098
2017 (engelsk)Licentiatavhandling, med artikler (Annet vitenskapelig)
Abstract [en]

In the last decade, we have seen a transition from single-core to manycore in computer architectures due to performance requirements and limitations in power consumption and heat dissipation. The first manycores had homogeneous architectures consisting of a few identical cores. However, the applications, which are executed on these architectures, usually consist of several tasks requiring different hardware resources to be executed efficiently. Therefore, we believe that utilizing heterogeneity in manycores will increase the efficiency of the architectures in terms of performance and power consumption. However, development of heterogeneous architectures is more challenging and the transition from homogeneous to heterogeneous architectures will increase the difficulty of efficient software development due to the increased complexity of the architecture. In order to increase the efficiency of hardware and software development, new hardware design methods and software development tools are required. Additionally, there is a lack of knowledge on the performance of applications when executed on manycore architectures.

The transition began with a shift from single-core architectures to homogeneous multicore architectures consisting of a few identical cores. It now continues with a shift from homogeneous architectures with identical cores to heterogeneous architectures with different types of cores specialized for different purposes. However, this transition has increased the complexity of architectures and hence the complexity of software development and execution. In order to decrease the complexity of software development, new software tools are required. Additionally, there is a lack of knowledge on what kind of heterogeneous manycore design is most efficient for different applications and what are the performances of these applications when executed on current commercial manycores.

This thesis studies manycore architectures in order to reveal possible uses of heterogeneity in manycores and facilitate choice of architecture for software and hardware developers. It defines a taxonomy for manycore architectures that is based on the levels of heterogeneity they contain and discusses benefits and drawbacks of these levels. Additionally, it evaluates several applications, a dataflow language (CAL), a source-to-source compilation framework (Cal2Many), and a commercial manycore architecture (Epiphany). The compilation framework takes implementations written in the dataflow language as input and generates code targetting different manycore platforms. Based on these evaluations, the thesis identifies the bottlenecks of the architecture. It finally presents a methodology for developing heterogeneoeus manycore architectures which target specific application domains.

Our studies show that using different types of cores in manycore architectures has the potential to increase the performance of streaming applications. If we add specialized hardware blocks to a core, the performance easily increases by 15x for the target application while the core size increases by 40-50% which can be optimized further. Other results prove that dataflow languages, together with software development tools, decrease software development efforts significantly (25-50%) while having a small impact (2-17%) on the performance.

sted, utgiver, år, opplag, sider
Halmstad: Halmstad University Press, 2017. , s. 78
Serie
Halmstad University Dissertations ; 29
Emneord [en]
Manycores, parallel architectures, parallelism, streaming applications, dataflow, manycore design, heterogeneous manycores
HSV kategori
Identifikatorer
URN: urn:nbn:se:hh:diva-33792ISBN: 978-91-87045-60-8 (tryckt)ISBN: 978-91-87045-61-5 (digital)OAI: oai:DiVA.org:hh-33792DiVA, id: diva2:1093334
Presentation
2017-06-02, Wigforss, Kristian IV:s väg 3, Halmstad, 13:15 (engelsk)
Opponent
Veileder
Prosjekter
HiPEC (High Performance Embedded Computing)NGES (Towards Next Generation Embedded Systems: Utilizing Parallelism and Reconfigurability)
Forskningsfinansiär
VINNOVASwedish Foundation for Strategic Research Tilgjengelig fra: 2017-05-09 Laget: 2017-05-05 Sist oppdatert: 2017-05-09bibliografisk kontrollert
Delarbeid
1. An Evaluation of Code Generation of Dataflow Languages on Manycore Architectures
Åpne denne publikasjonen i ny fane eller vindu >>An Evaluation of Code Generation of Dataflow Languages on Manycore Architectures
Vise andre…
2014 (engelsk)Inngår i: RTCSA 2014: 2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications, Piscataway, NJ: IEEE Press, 2014, artikkel-id 6910501Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Today computer architectures are shifting from single core to manycores due to several reasons such as performance demands, power and heat limitations. However, shifting to manycores results in additional complexities, especially with regard to efficient development of applications. Hence there is a need to raise the abstraction level of development techniques for the manycores while exposing the inherent parallelism in the applications. One promising class of programming languages is dataflow languages and in this paper we evaluate and optimize the code generation for one such language, CAL. We have also developed a communication library to support the inter-core communication.The code generation can target multiple architectures, but the results presented in this paper is focused on Adapteva's many core architecture Epiphany.We use the two-dimensional inverse discrete cosine transform (2D-IDCT) as our benchmark and compare our code generation from CAL with a hand-written implementation developed in C. Several optimizations in the code generation as well as in the communication library are described, and we have observed that the most critical optimization is reducing the number of external memory accesses. Combining all optimizations we have been able to reduce the difference in execution time between auto-generated and hand-written implementations from a factor of 4.3x down to a factor of only 1.3x. ©2014 IEEE.

sted, utgiver, år, opplag, sider
Piscataway, NJ: IEEE Press, 2014
Emneord
Manycore, Dataflow Languages, code generation, Actor Machine, 2D-IDCT, Epiphany, evaluation
HSV kategori
Identifikatorer
urn:nbn:se:hh:diva-25649 (URN)10.1109/RTCSA.2014.6910501 (DOI)000352610400005 ()2-s2.0-84908637354 (Scopus ID)
Konferanse
RTCSA 2014, 20th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, Chongqing, China, August 20-22, 2014
Prosjekter
HiPEC project
Forskningsfinansiär
Knowledge FoundationSwedish Foundation for Strategic Research
Merknad

The authors would like to thank Adapteva Inc. for giving access to their software development suite and hardware board. This research is part of the CERES research program funded by the Knowledge Foundation and HiPEC project funded by Swedish Foundation for Strategic Research (SSF).

Tilgjengelig fra: 2014-06-16 Laget: 2014-06-16 Sist oppdatert: 2019-05-07bibliografisk kontrollert
2. Dataflow Implementation of QR Decomposition on a Manycore
Åpne denne publikasjonen i ny fane eller vindu >>Dataflow Implementation of QR Decomposition on a Manycore
Vise andre…
2016 (engelsk)Inngår i: MES '16: Proceedings of the Third ACM International Workshop on Many-core Embedded Systems, New York, NY: ACM Press, 2016, s. 26-30Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

While parallel computer architectures have become mainstream, application development on them is still challenging. There is a need for new tools, languages and programming models. Additionally, there is a lack of knowledge about the performance of parallel approaches of basic but important operations, such as the QR decomposition of a matrix, on current commercial manycore architectures.

This paper evaluates a high level dataflow language (CAL), a source-to-source compiler (Cal2Many) and three QR decomposition algorithms (Givens Rotations, Householder and Gram-Schmidt). The algorithms are implemented both in CAL and hand-optimized C languages, executed on Adapteva's Epiphany manycore architecture and evaluated with respect to performance, scalability and development effort.

The performance of the CAL (generated C) implementations gets as good as 2\% slower than the hand-written versions. They require an average of 25\% fewer lines of source code without significantly increasing the binary size. Development effort is reduced and debugging is significantly simplified. The implementations executed on Epiphany cores outperform the GNU scientific library on the host ARM processor of the Parallella board by up to 30x. © 2016 Copyright held by the owner/author(s).

sted, utgiver, år, opplag, sider
New York, NY: ACM Press, 2016
HSV kategori
Identifikatorer
urn:nbn:se:hh:diva-32371 (URN)10.1145/2934495.2934499 (DOI)2-s2.0-84991106778 (Scopus ID)978-1-4503-4262-9 (ISBN)
Konferanse
MES '16, International Workshop on Many-core Embedded Systems, Seoul, Republic of Korea, June 19, 2016
Prosjekter
ESCHERHiPEC
Forskningsfinansiär
Knowledge FoundationSwedish Foundation for Strategic Research ELLIIT - The Linköping‐Lund Initiative on IT and Mobile Communications
Tilgjengelig fra: 2016-11-04 Laget: 2016-11-04 Sist oppdatert: 2019-05-07bibliografisk kontrollert
3. Efficient Single-Precision Floating-Point Division Using Harmonized Parabolic Synthesis
Åpne denne publikasjonen i ny fane eller vindu >>Efficient Single-Precision Floating-Point Division Using Harmonized Parabolic Synthesis
2017 (engelsk)Inngår i: 2017 IEEE Computer Society Annual Symposium on VLSI: ISVLSI 2017 / [ed] Michael Hübner, Ricardo Reis, Mircea Stan & Nikolaos Voros, Los Alamitos: IEEE, 2017Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

This paper proposes a novel method for performing division on floating-point numbers represented in IEEE-754 single-precision (binary32) format. The method is based on an inverter, implemented as a combination of Parabolic Synthesis and second-degree interpolation, followed by a multiplier. It is implemented with and without pipeline stages individually and synthesized while targeting a Xilinx Ultrascale FPGA.

The implementations show better resource usage and latency results when compared to other implementations based on different methods. In case of throughput, the proposed method outperforms most of the other works, however, some Altera FPGAs achieve higher clock rate due to the differences in the DSP slice multiplier design.

Due to the small size, low latency and high throughput, the presented floating-point division unit is suitable for high performance embedded systems and can be integrated into accelerators or be used as a stand-alone accelerator.

sted, utgiver, år, opplag, sider
Los Alamitos: IEEE, 2017
Serie
IEEE Computer Society Annual Symposium on VLSI, ISSN 2159-3477
Emneord
Floating-point, single precision, division, FPGA, Harmonized Parabolic Synthesis
HSV kategori
Identifikatorer
urn:nbn:se:hh:diva-33793 (URN)10.1109/ISVLSI.2017.28 (DOI)2-s2.0-85027258772 (Scopus ID)978-1-5090-6762-6 (ISBN)978-1-5090-6763-3 (ISBN)
Konferanse
IEEE Computer Society Annual Symposium on VLSI, July 3-5, 2017, Bochum, Germany
Prosjekter
NGES
Forskningsfinansiär
VINNOVA
Tilgjengelig fra: 2017-05-05 Laget: 2017-05-05 Sist oppdatert: 2019-05-07bibliografisk kontrollert
4. Designing Domain Specific Heterogeneous Manycore Architectures Based on Building Blocks
Åpne denne publikasjonen i ny fane eller vindu >>Designing Domain Specific Heterogeneous Manycore Architectures Based on Building Blocks
2018 (engelsk)Manuskript (preprint) (Annet vitenskapelig)
Abstract [en]

Performance and power requirements has pushed computer architectures from single core to manycores. These requirements now continue pushing the manycores with identical cores (homogeneous) to manycores with specialized cores (heterogeneous). However designing heterogeneous manycores is a challenging task due to the complexity of the architectures. We propose an approach for designing domain specific heterogeneous manycore architectures based on building blocks. These blocks are defined as the common computations of the applications within a domain. The objective is to generate heterogeneous architectures by integrating many of these blocks to many simple cores and connect the cores with a networkon-chip. The proposed approach aims to ease the design of heterogeneous manycore architectures and facilitate usage of dark silicon concept. As a case study, we develop an accelerator based on several building blocks, integrate it to a RISC core and synthesize on a Xilinx Ultrascale FPGA. The results show that executing a hot-spot of an application on an accelerator based on building blocks increases the performance by 15x, with room for further improvement. The area usage increases as well, however there are potential optimizations to reduce the area usage. © 2018 by the authors

Emneord
heterogeneous architecture design, risc-v, dataflow, QR decomposition, domain-specific processor, accelerator, Autofocus, hardware software co-design
HSV kategori
Identifikatorer
urn:nbn:se:hh:diva-33818 (URN)
Prosjekter
HiPEC (High Performance Embedded Computing)NGES (Towards Next, Generation Embedded Systems: Utilizing Parallelism and Reconfigurability)
Forskningsfinansiär
Swedish Foundation for Strategic Research VINNOVA
Tilgjengelig fra: 2017-05-09 Laget: 2017-05-09 Sist oppdatert: 2018-12-05bibliografisk kontrollert

Open Access i DiVA

fulltext(2047 kB)1111 nedlastinger
Filinformasjon
Fil FULLTEXT02.pdfFilstørrelse 2047 kBChecksum SHA-512
fe0d054339b387b4e7421981a10f1d7b411818ce42d18c3f6fbe58f558015f22dc6e6e22e50db7bc641852a25744f720de1498be21517b617f96a1f775bf62af
Type fulltextMimetype application/pdf

Personposter BETA

Savas, Süleyman

Søk i DiVA

Av forfatter/redaktør
Savas, Süleyman
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 1111 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

isbn
urn-nbn

Altmetric

isbn
urn-nbn
Totalt: 1125 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf