hh.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
An Evaluation of Code Generation of Dataflow Languages on Manycore Architectures
Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).ORCID iD: 0000-0001-8652-0098
Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).ORCID iD: 0000-0002-4932-4036
Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).ORCID iD: 0000-0002-0562-2082
Show others and affiliations
2014 (English)In: RTCSA 2014: 2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications, Piscataway, NJ: IEEE Press, 2014, 6910501Conference paper, Published paper (Refereed)
Abstract [en]

Today computer architectures are shifting from single core to manycores due to several reasons such as performance demands, power and heat limitations. However, shifting to manycores results in additional complexities, especially with regard to efficient development of applications. Hence there is a need to raise the abstraction level of development techniques for the manycores while exposing the inherent parallelism in the applications. One promising class of programming languages is dataflow languages and in this paper we evaluate and optimize the code generation for one such language, CAL. We have also developed a communication library to support the inter-core communication.The code generation can target multiple architectures, but the results presented in this paper is focused on Adapteva's many core architecture Epiphany.We use the two-dimensional inverse discrete cosine transform (2D-IDCT) as our benchmark and compare our code generation from CAL with a hand-written implementation developed in C. Several optimizations in the code generation as well as in the communication library are described, and we have observed that the most critical optimization is reducing the number of external memory accesses. Combining all optimizations we have been able to reduce the difference in execution time between auto-generated and hand-written implementations from a factor of 4.3x down to a factor of only 1.3x. ©2014 IEEE.

Place, publisher, year, edition, pages
Piscataway, NJ: IEEE Press, 2014. 6910501
Keyword [en]
Manycore, Dataflow Languages, code generation, Actor Machine, 2D-IDCT, Epiphany, evaluation
National Category
Embedded Systems
Identifiers
URN: urn:nbn:se:hh:diva-25649DOI: 10.1109/RTCSA.2014.6910501ISI: 000352610400005Scopus ID: 2-s2.0-84908637354OAI: oai:DiVA.org:hh-25649DiVA: diva2:725348
Conference
RTCSA 2014, 20th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, Chongqing, China, August 20-22, 2014
Projects
HiPEC project
Funder
Knowledge FoundationSwedish Foundation for Strategic Research
Note

The authors would like to thank Adapteva Inc. for giving access to their software development suite and hardware board. This research is part of the CERES research program funded by the Knowledge Foundation and HiPEC project funded by Swedish Foundation for Strategic Research (SSF).

Available from: 2014-06-16 Created: 2014-06-16 Last updated: 2017-09-05Bibliographically approved
In thesis
1. Compiling Concurrent Programs for Manycores
Open this publication in new window or tab >>Compiling Concurrent Programs for Manycores
2015 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

The arrival of manycore systems enforces new approaches for developing applications in order to exploit the available hardware resources. Developing applications for manycores requires programmers to partition the application into subtasks, consider the dependence between the subtasks, understand the underlying hardware and select an appropriate programming model. This is complex, time-consuming and prone to error.

In this thesis, we identify and implement abstraction layers in compilation tools to decrease the burden of the programmer, increase programming productivity and program portability for manycores and to analyze their impact on performance and efficiency. We present compilation frameworks for two concurrent programming languages, occam-pi and CAL Actor Language, and demonstrate the applicability of the approach with application case-studies targeting these different manycore architectures: STHorm, Epiphany and Ambric.

For occam-pi, we have extended the Tock compiler and added a backend for STHorm. We evaluate the approach using a fault tolerance model for a four stage 1D-DCT algorithm implemented by using occam-pi’s constructs for dynamic reconfiguration, and the FAST corner detection algorithm which demonstrates the suitability of occam-pi and the compilation framework for data-intensive applications. We also present a new CAL compilation framework which has a front end, two intermediate representations and three backends: for a uniprocessor, Epiphany, and Ambric. We show the feasibility of our approach by compiling a CAL implementation of the 2D-IDCT for the three backends. We also present an evaluation and optimization of code generation for Epiphany by comparing the code generated from CAL with a hand-written C code implementation of 2D-IDCT.

Place, publisher, year, edition, pages
Halmstad: Halmstad University Press, 2015. 35 p.
Series
Halmstad University Dissertations, 11
National Category
Embedded Systems
Identifiers
urn:nbn:se:hh:diva-27789 (URN)978-91-87045-25-7 (ISBN)978-91-87045-24-0 (ISBN)
Presentation
2015-03-20, Haldasalen, House Visionen, Halmstad University, Halmstad, 10:15 (English)
Opponent
Supervisors
Available from: 2015-02-16 Created: 2015-02-13 Last updated: 2015-08-21Bibliographically approved
2. Utilizing Heterogeneity in Manycore Architectures for Streaming Applications
Open this publication in new window or tab >>Utilizing Heterogeneity in Manycore Architectures for Streaming Applications
2017 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

In the last decade, we have seen a transition from single-core to manycore in computer architectures due to performance requirements and limitations in power consumption and heat dissipation. The first manycores had homogeneous architectures consisting of a few identical cores. However, the applications, which are executed on these architectures, usually consist of several tasks requiring different hardware resources to be executed efficiently. Therefore, we believe that utilizing heterogeneity in manycores will increase the efficiency of the architectures in terms of performance and power consumption. However, development of heterogeneous architectures is more challenging and the transition from homogeneous to heterogeneous architectures will increase the difficulty of efficient software development due to the increased complexity of the architecture. In order to increase the efficiency of hardware and software development, new hardware design methods and software development tools are required. Additionally, there is a lack of knowledge on the performance of applications when executed on manycore architectures.

The transition began with a shift from single-core architectures to homogeneous multicore architectures consisting of a few identical cores. It now continues with a shift from homogeneous architectures with identical cores to heterogeneous architectures with different types of cores specialized for different purposes. However, this transition has increased the complexity of architectures and hence the complexity of software development and execution. In order to decrease the complexity of software development, new software tools are required. Additionally, there is a lack of knowledge on what kind of heterogeneous manycore design is most efficient for different applications and what are the performances of these applications when executed on current commercial manycores.

This thesis studies manycore architectures in order to reveal possible uses of heterogeneity in manycores and facilitate choice of architecture for software and hardware developers. It defines a taxonomy for manycore architectures that is based on the levels of heterogeneity they contain and discusses benefits and drawbacks of these levels. Additionally, it evaluates several applications, a dataflow language (CAL), a source-to-source compilation framework (Cal2Many), and a commercial manycore architecture (Epiphany). The compilation framework takes implementations written in the dataflow language as input and generates code targetting different manycore platforms. Based on these evaluations, the thesis identifies the bottlenecks of the architecture. It finally presents a methodology for developing heterogeneoeus manycore architectures which target specific application domains.

Our studies show that using different types of cores in manycore architectures has the potential to increase the performance of streaming applications. If we add specialized hardware blocks to a core, the performance easily increases by 15x for the target application while the core size increases by 40-50% which can be optimized further. Other results prove that dataflow languages, together with software development tools, decrease software development efforts significantly (25-50%) while having a small impact (2-17%) on the performance.

Place, publisher, year, edition, pages
Halmstad: Halmstad University Press, 2017. 78 p.
Series
Halmstad University Dissertations, 29
Keyword
Manycores, parallel architectures, parallelism, streaming applications, dataflow, manycore design, heterogeneous manycores
National Category
Computer Systems
Identifiers
urn:nbn:se:hh:diva-33792 (URN)978-91-87045-60-8 (ISBN)978-91-87045-61-5 (ISBN)
Presentation
2017-06-02, Wigforss, Kristian IV:s väg 3, Halmstad, 13:15 (English)
Opponent
Supervisors
Projects
HiPEC (High Performance Embedded Computing)NGES (Towards Next Generation Embedded Systems: Utilizing Parallelism and Reconfigurability)
Funder
VINNOVASwedish Foundation for Strategic Research
Available from: 2017-05-09 Created: 2017-05-05 Last updated: 2017-05-09Bibliographically approved
3. Tools to Compile Dataflow Programs for Manycores
Open this publication in new window or tab >>Tools to Compile Dataflow Programs for Manycores
2017 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The arrival of manycore systems enforces new approaches for developing applications in order to exploit the available hardware resources. Developing applications for manycores requires programmers to partition the application into subtasks, consider the dependence between the subtasks, understand the underlying hardware and select an appropriate programming model. This is complex, time-consuming and prone to error. In this thesis, we identify and implement abstraction layers in compilation tools to decrease the burden of the programmer, increase program portability and scalability, and increase retargetability of the compilation framework. We present compilation frameworks for two concurrent programming languages, occam-pi and CAL Actor Language, and demonstrate the applicability of the approach with application case-studies targeting these different manycore architectures: STHorm, Epiphany, Ambric, EIT, and ePUMA. For occam-pi, we have extended the Tock compiler and added a backend for STHorm. We evaluate the approach using a fault tolerance model for a four stage 1D-DCT algorithm implemented by using occam-pi's constructs for dynamic reconguration, and the FAST corner detection algorithm which demonstrates the suitability of occam-pi and the compilation framework for data-intensive applications. For CAL, we have developed a new compilation framework, namely Cal2Many. The Cal2Many framework has a front end, two intermediate representations and four backends: for a uniprocessor, Epiphany, Ambric, and a backend for SIMD based architectures. Also, we have identied and implemented of CAL actor fusion and fission methodologies for efficient mapping CAL applications. We have used QRD, FAST corner detection, 2D-IDCT, and MPEG applications to evaluate our compilation process and to analyze the limitations of the hardware.

Place, publisher, year, edition, pages
Halmstad: Halmstad University Press, 2017. 35 p.
Series
Halmstad University Dissertations, 33
National Category
Embedded Systems
Identifiers
urn:nbn:se:hh:diva-34883 (URN)978-91-87045-69-1 (ISBN)978-91-87045-68-4 (ISBN)
Public defence
2017-09-27, Wigforssalen, Hus J (Visionen), Kristian IV:s väg 3, Halmstad, 13:15 (English)
Opponent
Supervisors
Available from: 2017-09-06 Created: 2017-09-05 Last updated: 2017-09-06Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Savas, SüleymanGebrewahid, EssayasUl-Abdin, ZainNordström, TomasYang, Mingkun
By organisation
Centre for Research on Embedded Systems (CERES)School of Information Science, Computer and Electrical Engineering (IDE)
Embedded Systems

Search outside of DiVA

GoogleGoogle Scholar

Altmetric score

Total: 242 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf