hh.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Manycore performance analysis using timed configuration graphs
Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).ORCID iD: 0000-0001-6625-6533
2009 (English)In: International Symposium on Systems, Architectures, Modeling, and Simulation, 2009. SAMOS '09 / [ed] Michael Joseph Schulte and Walid Najjar, Piscataway, N.J.: IEEE Press, 2009, 108-117 p.Conference paper, (Refereed)
Abstract [en]

The programming complexity of increasingly parallel processors calls for new tools to assist programmers in utilising the parallel hardware resources. In this paper we present a set of models that we have developed to form part of a tool which is intended for iteratively tuning the mapping of dataflow graphs onto manycores. One of the models is used for capturing the essentials of manycores that are identified as suitable for signal processing and which we use as target architectures. Another model is the intermediate representation in the form of a timed configuration graph, describing the mapping of a dataflow graph onto a machine model. Moreover, this IR can be used for performance evaluation using abstract interpretation. We demonstrate how the models can be configured and applied in order to map applications on the Raw processor. Furthermore, we report promising results on the accuracy of performance predictions produced by our tool. It is also demonstrated that the tool can be used to rank different mappings with respect to optimisation on throughput and end-to-end latency.

Place, publisher, year, edition, pages
Piscataway, N.J.: IEEE Press, 2009. 108-117 p.
Keyword [en]
graphs, microcomputers, parallel architectures, parallel programming, program compilers, software performance evaluation, task analysis
National Category
Computer Engineering
Identifiers
URN: urn:nbn:se:hh:diva-5987DOI: 10.1109/ICSAMOS.2009.5289221ISI: 000276377000014Scopus ID: 2-s2.0-71949094275ISBN: 978-1-4244-4502-8 OAI: oai:DiVA.org:hh-5987DiVA: diva2:353074
Conference
2009 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, IC-SAMOS 2009, Samos, 20 - 23 July, 2009
Note

©2009 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

Available from: 2010-09-23 Created: 2010-09-23 Last updated: 2014-08-21Bibliographically approved
In thesis
1. Models and Methods for Development of DSP Applications on Manycore Processors
Open this publication in new window or tab >>Models and Methods for Development of DSP Applications on Manycore Processors
2009 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Advanced digital signal processing systems require specialized high-performance embedded computer architectures. The term high-performance translates to large amounts of data and computations per time unit. The term embedded further implies requirements on physical size and power efficiency. Thus the requirements are of both functional and non-functional nature. This thesis addresses the development of high-performance digital signal processing systems relying on manycore technology. We propose building two-level hierarchical computer architectures for this domain of applications. Further, we outline a tool flow based on methods and analysis techniques for automated, multi-objective mapping of such applications on distributed memory manycore processors. In particular, the focus is put on how to provide a means for tunable strategies for mapping of task graphs on array structured distributed memory manycores, with respect to given application constraints. We argue for code mapping strategies based on predicted execution performance, which can be used in an auto-tuning feedback loop or to guide manual tuning directed by the programmer. Automated parallelization, optimisation and mapping to a manycore processor benefits from the use of a concurrent programming model as the starting point. Such a model allows the programmer to express different types and granularities of parallelism as well as computation characteristics of importance in the addressed class of applications. The programming model should also abstract away machine dependent hardware details. The analytical study of WCDMA baseband processing in radio base stations, presented in this thesis, suggests dataflow models as a good match to the characteristics of the application and as execution model abstracting computations on a manycore. Construction of portable tools further requires a manycore machine model and an intermediate representation. The models are needed in order to decouple algorithms, used to transform and map application software, from hardware. We propose a manycore machine model that captures common hardware resources, as well as resource dependent performance metrics for parallel computation and communication. Further, we have developed a multifunctional intermediate representation, which can be used as source for code generation and for dynamic execution analysis. Finally, we demonstrate how we can dynamically analyse execution using abstract interpretation on the intermediate representation. It is shown that the performance predictions can be used to accurately rank different mappings by best throughput or shortest end-to-end computation latency.

Place, publisher, year, edition, pages
Göteborg: Chalmers University of Technology, 2009. 173 p.
Series
Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie, ISSN 0346-718X ; 2969
Keyword
parallel processing, manycore processors, high-performance digital signal processing, dataflow, concurrent models of computation, parallel code mapping, parallel machine model, dynamic performance analysis
National Category
Computer Engineering
Identifiers
urn:nbn:se:hh:diva-14706 (URN)978-91-7385-288-3 (ISBN)
Public defence
2009-06-10, Wigforssalen, house Visionen, Halmstad University, Kristian IV:s väg 3, Halmstad, 13:15 (English)
Opponent
Supervisors
Available from: 2011-04-20 Created: 2011-04-04 Last updated: 2011-04-20Bibliographically approved

Open Access in DiVA

fulltext(726 kB)280 downloads
File information
File name FULLTEXT01.pdfFile size 726 kBChecksum SHA-512
ab0ffbb60bbcf61529f7ce8c0a0813010de7ee7ad03c04e09d66646d3bcd89e54b7d0ae7aceefdd4b44f15270e048ee294626a014a6cff2da6b8f5f29ac36733
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Bengtsson, JerkerSvensson, Bertil
By organisation
Centre for Research on Embedded Systems (CERES)
Computer Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 280 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 260 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf