hh.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Dataflow Implementation of QR Decomposition on a Manycore
Högskolan i Halmstad, Akademin för informationsteknologi, Halmstad Embedded and Intelligent Systems Research (EIS), Centrum för forskning om inbyggda system (CERES). (EPC Group)ORCID-id: 0000-0001-8652-0098
Högskolan i Halmstad, Akademin för informationsteknologi, Halmstad Embedded and Intelligent Systems Research (EIS), Centrum för forskning om inbyggda system (CERES).
Högskolan i Halmstad, Akademin för informationsteknologi, Halmstad Embedded and Intelligent Systems Research (EIS), Centrum för forskning om inbyggda system (CERES).
Högskolan i Halmstad, Akademin för informationsteknologi, Halmstad Embedded and Intelligent Systems Research (EIS), Centrum för forskning om inbyggda system (CERES).ORCID-id: 0000-0002-4932-4036
Visa övriga samt affilieringar
2016 (Engelska)Ingår i: MES '16: Proceedings of the Third ACM International Workshop on Many-core Embedded Systems, New York, NY: ACM Press, 2016, s. 26-30Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

While parallel computer architectures have become mainstream, application development on them is still challenging. There is a need for new tools, languages and programming models. Additionally, there is a lack of knowledge about the performance of parallel approaches of basic but important operations, such as the QR decomposition of a matrix, on current commercial manycore architectures.

This paper evaluates a high level dataflow language (CAL), a source-to-source compiler (Cal2Many) and three QR decomposition algorithms (Givens Rotations, Householder and Gram-Schmidt). The algorithms are implemented both in CAL and hand-optimized C languages, executed on Adapteva's Epiphany manycore architecture and evaluated with respect to performance, scalability and development effort.

The performance of the CAL (generated C) implementations gets as good as 2\% slower than the hand-written versions. They require an average of 25\% fewer lines of source code without significantly increasing the binary size. Development effort is reduced and debugging is significantly simplified. The implementations executed on Epiphany cores outperform the GNU scientific library on the host ARM processor of the Parallella board by up to 30x. © 2016 Copyright held by the owner/author(s).

Ort, förlag, år, upplaga, sidor
New York, NY: ACM Press, 2016. s. 26-30
Nationell ämneskategori
Inbäddad systemteknik
Identifikatorer
URN: urn:nbn:se:hh:diva-32371DOI: 10.1145/2934495.2934499Scopus ID: 2-s2.0-84991106778ISBN: 978-1-4503-4262-9 (tryckt)OAI: oai:DiVA.org:hh-32371DiVA, id: diva2:1044642
Konferens
MES '16, International Workshop on Many-core Embedded Systems, Seoul, Republic of Korea, June 19, 2016
Projekt
ESCHERHiPEC
Forskningsfinansiär
KK-stiftelsenStiftelsen för strategisk forskning (SSF)ELLIIT - The Linköping‐Lund Initiative on IT and Mobile CommunicationsTillgänglig från: 2016-11-04 Skapad: 2016-11-04 Senast uppdaterad: 2019-05-07Bibliografiskt granskad
Ingår i avhandling
1. Utilizing Heterogeneity in Manycore Architectures for Streaming Applications
Öppna denna publikation i ny flik eller fönster >>Utilizing Heterogeneity in Manycore Architectures for Streaming Applications
2017 (Engelska)Licentiatavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

In the last decade, we have seen a transition from single-core to manycore in computer architectures due to performance requirements and limitations in power consumption and heat dissipation. The first manycores had homogeneous architectures consisting of a few identical cores. However, the applications, which are executed on these architectures, usually consist of several tasks requiring different hardware resources to be executed efficiently. Therefore, we believe that utilizing heterogeneity in manycores will increase the efficiency of the architectures in terms of performance and power consumption. However, development of heterogeneous architectures is more challenging and the transition from homogeneous to heterogeneous architectures will increase the difficulty of efficient software development due to the increased complexity of the architecture. In order to increase the efficiency of hardware and software development, new hardware design methods and software development tools are required. Additionally, there is a lack of knowledge on the performance of applications when executed on manycore architectures.

The transition began with a shift from single-core architectures to homogeneous multicore architectures consisting of a few identical cores. It now continues with a shift from homogeneous architectures with identical cores to heterogeneous architectures with different types of cores specialized for different purposes. However, this transition has increased the complexity of architectures and hence the complexity of software development and execution. In order to decrease the complexity of software development, new software tools are required. Additionally, there is a lack of knowledge on what kind of heterogeneous manycore design is most efficient for different applications and what are the performances of these applications when executed on current commercial manycores.

This thesis studies manycore architectures in order to reveal possible uses of heterogeneity in manycores and facilitate choice of architecture for software and hardware developers. It defines a taxonomy for manycore architectures that is based on the levels of heterogeneity they contain and discusses benefits and drawbacks of these levels. Additionally, it evaluates several applications, a dataflow language (CAL), a source-to-source compilation framework (Cal2Many), and a commercial manycore architecture (Epiphany). The compilation framework takes implementations written in the dataflow language as input and generates code targetting different manycore platforms. Based on these evaluations, the thesis identifies the bottlenecks of the architecture. It finally presents a methodology for developing heterogeneoeus manycore architectures which target specific application domains.

Our studies show that using different types of cores in manycore architectures has the potential to increase the performance of streaming applications. If we add specialized hardware blocks to a core, the performance easily increases by 15x for the target application while the core size increases by 40-50% which can be optimized further. Other results prove that dataflow languages, together with software development tools, decrease software development efforts significantly (25-50%) while having a small impact (2-17%) on the performance.

Ort, förlag, år, upplaga, sidor
Halmstad: Halmstad University Press, 2017. s. 78
Serie
Halmstad University Dissertations ; 29
Nyckelord
Manycores, parallel architectures, parallelism, streaming applications, dataflow, manycore design, heterogeneous manycores
Nationell ämneskategori
Datorsystem
Identifikatorer
urn:nbn:se:hh:diva-33792 (URN)978-91-87045-60-8 (ISBN)978-91-87045-61-5 (ISBN)
Presentation
2017-06-02, Wigforss, Kristian IV:s väg 3, Halmstad, 13:15 (Engelska)
Opponent
Handledare
Projekt
HiPEC (High Performance Embedded Computing)NGES (Towards Next Generation Embedded Systems: Utilizing Parallelism and Reconfigurability)
Forskningsfinansiär
VINNOVAStiftelsen för strategisk forskning (SSF)
Tillgänglig från: 2017-05-09 Skapad: 2017-05-05 Senast uppdaterad: 2017-05-09Bibliografiskt granskad
2. Tools to Compile Dataflow Programs for Manycores
Öppna denna publikation i ny flik eller fönster >>Tools to Compile Dataflow Programs for Manycores
2017 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

The arrival of manycore systems enforces new approaches for developing applications in order to exploit the available hardware resources. Developing applications for manycores requires programmers to partition the application into subtasks, consider the dependence between the subtasks, understand the underlying hardware and select an appropriate programming model. This is complex, time-consuming and prone to error. In this thesis, we identify and implement abstraction layers in compilation tools to decrease the burden of the programmer, increase program portability and scalability, and increase retargetability of the compilation framework. We present compilation frameworks for two concurrent programming languages, occam-pi and CAL Actor Language, and demonstrate the applicability of the approach with application case-studies targeting these different manycore architectures: STHorm, Epiphany, Ambric, EIT, and ePUMA. For occam-pi, we have extended the Tock compiler and added a backend for STHorm. We evaluate the approach using a fault tolerance model for a four stage 1D-DCT algorithm implemented by using occam-pi's constructs for dynamic reconguration, and the FAST corner detection algorithm which demonstrates the suitability of occam-pi and the compilation framework for data-intensive applications. For CAL, we have developed a new compilation framework, namely Cal2Many. The Cal2Many framework has a front end, two intermediate representations and four backends: for a uniprocessor, Epiphany, Ambric, and a backend for SIMD based architectures. Also, we have identied and implemented of CAL actor fusion and fission methodologies for efficient mapping CAL applications. We have used QRD, FAST corner detection, 2D-IDCT, and MPEG applications to evaluate our compilation process and to analyze the limitations of the hardware.

Ort, förlag, år, upplaga, sidor
Halmstad: Halmstad University Press, 2017. s. 35
Serie
Halmstad University Dissertations ; 33
Nationell ämneskategori
Inbäddad systemteknik
Identifikatorer
urn:nbn:se:hh:diva-34883 (URN)978-91-87045-69-1 (ISBN)978-91-87045-68-4 (ISBN)
Disputation
2017-09-27, Wigforssalen, Hus J (Visionen), Kristian IV:s väg 3, Halmstad, 13:15 (Engelska)
Opponent
Handledare
Tillgänglig från: 2017-09-06 Skapad: 2017-09-05 Senast uppdaterad: 2017-09-06Bibliografiskt granskad
3. Hardware/Software Co-Design of Heterogeneous Manycore Architectures
Öppna denna publikation i ny flik eller fönster >>Hardware/Software Co-Design of Heterogeneous Manycore Architectures
2019 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

In the era of big data, advanced sensing, and artificial intelligence, the required computation power is provided mostly by multicore and manycore architectures. However, the performance demand keeps growing. Thus the computer architectures need to continue evolving and provide higher performance. The applications, which are executed on the manycore architectures, are divided into several tasks to be mapped on separate cores and executed in parallel. Usually these tasks are not identical and may be executed more efficiently on different types of cores within a heterogeneous architecture. Therefore, we believe that the heterogeneous manycores are the next step for the computer architectures. However, there is a lack of knowledge on what form of heterogeneity is the best match for a given application or application domain. This knowledge can be acquired through designing these architectures and testing different design configurations. However, designing these architectures is a great challenge. Therefore, there is a need for an automated design method to facilitate the architecture design and design space exploration to gather knowledge on architectures with different configurations. Additionally, it is already difficult to program manycore architectures efficiently and this difficulty will only increase further with the introduction of heterogeneity due to the increase in the complexity of the architectures, unless this complexity is somehow hidden. There is a need for software development tools to facilitate the software development for these architectures and enable portability of the same software across different manycore platforms.

In this thesis, we first address the challenges of the software development for manycore architectures. We evaluate a dataflow language (CAL) and a source-to-source compilation framework (Cal2Many) with several case studies in order to reveal their impact on productivity and performance of the software. The language supports task level parallelism by adopting actor model and the framework takes CAL code and generates implementations in the native language of several different architectures.

In order to address the challenge of custom hardware development, we first evaluate a commercial manycore architecture namely Epiphany and identify its demerits. Then we study manycore architectures in order to reveal possible uses of heterogeneity in manycores and facilitate choice of architecture for software and hardware development. We define a taxonomy for manycore architectures that is based on the levels of heterogeneity they contain and discuss the benefits and drawbacks of these levels. We finally develop and evaluate a design method to design heterogeneous manycore architectures customized based on application requirements. The architectures designed with this method consist of cores with application specific accelerators. The majority of the design method is automated with software tools, which support different design configurations in order to increase the productivity of the hardware developer and enable design space exploration.

Our results show that the dataflow language, together with the software development tool, decreases software development efforts significantly (25-50%), while having a small impact (2-17%) on the performance. The evaluation of the design method reveal that the performance of automatically generated accelerators is between 96-100% of the performance of their manually developed counterparts. Additionally, it is possible to increase the performance of the architectures by increasing the number of cores and using application specific accelerators, usually with a cost on the area usage. However, under certain circumstances, using accelerator may lead to avoiding usage of large general purpose components such as the floating-point unit and therefore improves the area utilization. Eventually, the final impact on the performance and area usage depends on the configurations. When compared to the Epiphany architecture, which is a commercial homogeneous manycore, the generated manycores show competitive results. We can conclude that the automated design method simplifies heterogeneous manycore architecture design and facilitates design space exploration with the use of configurable parameters.

Ort, förlag, år, upplaga, sidor
Halmstad: Halmstad University Press, 2019. s. 205
Serie
Halmstad University Dissertations ; 57
Nyckelord
hardware/software co-design, manycore architectures, heterogeneous manycores, processor design, parallel computing, high performance computing
Nationell ämneskategori
Datorsystem Inbäddad systemteknik
Identifikatorer
urn:nbn:se:hh:diva-39325 (URN)978-91-88749-22-2 (ISBN)978-91-88749-23-9 (ISBN)
Disputation
2019-05-28, Wigforssalen, Visionen, Kristian IV:s väg 3, Halmstad, 13:15 (Engelska)
Opponent
Handledare
Projekt
HIPEC - High Performance Embedded ComputingESCHER
Forskningsfinansiär
VinnovaKK-stiftelsenStiftelsen för strategisk forskning (SSF)
Tillgänglig från: 2019-05-08 Skapad: 2019-05-07 Senast uppdaterad: 2019-05-08Bibliografiskt granskad

Open Access i DiVA

fulltext(368 kB)777 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 368 kBChecksumma SHA-512
2bfd6168167d3e92bbefa2a3bf9665ff1f3a7ab8135497d40ef7ac1f1cd7b021b10d44125f4061b06c4c06600a4809a54bac9103ae03bcf247e4f3326c8a2524
Typ fulltextMimetyp application/pdf

Övriga länkar

Förlagets fulltextScopus

Personposter BETA

Savas, SüleymanRaase, SebastianGebrewahid, EssayasUl-Abdin, ZainNordström, Tomas

Sök vidare i DiVA

Av författaren/redaktören
Savas, SüleymanRaase, SebastianGebrewahid, EssayasUl-Abdin, ZainNordström, Tomas
Av organisationen
Centrum för forskning om inbyggda system (CERES)
Inbäddad systemteknik

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 777 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

doi
isbn
urn-nbn

Altmetricpoäng

doi
isbn
urn-nbn
Totalt: 425 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf