hh.sePublications
Change search
Link to record
Permanent link

Direct link
BETA
Alternative names
Publications (10 of 40) Show all publications
Savas, S., Ul-Abdin, Z. & Nordström, T. (2018). Designing Domain Specific Heterogeneous Manycore Architectures Based on Building Blocks.
Open this publication in new window or tab >>Designing Domain Specific Heterogeneous Manycore Architectures Based on Building Blocks
2018 (English)Manuscript (preprint) (Other academic)
Abstract [en]

Performance and power requirements has pushed computer architectures from single core to manycores. These requirements now continue pushing the manycores with identical cores (homogeneous) to manycores with specialized cores (heterogeneous). However designing heterogeneous manycores is a challenging task due to the complexity of the architectures. We propose an approach for designing domain specific heterogeneous manycore architectures based on building blocks. These blocks are defined as the common computations of the applications within a domain. The objective is to generate heterogeneous architectures by integrating many of these blocks to many simple cores and connect the cores with a networkon-chip. The proposed approach aims to ease the design of heterogeneous manycore architectures and facilitate usage of dark silicon concept. As a case study, we develop an accelerator based on several building blocks, integrate it to a RISC core and synthesize on a Xilinx Ultrascale FPGA. The results show that executing a hot-spot of an application on an accelerator based on building blocks increases the performance by 15x, with room for further improvement. The area usage increases as well, however there are potential optimizations to reduce the area usage. © 2018 by the authors

Keywords
heterogeneous architecture design, risc-v, dataflow, QR decomposition, domain-specific processor, accelerator, Autofocus, hardware software co-design
National Category
Embedded Systems
Identifiers
urn:nbn:se:hh:diva-33818 (URN)
Projects
HiPEC (High Performance Embedded Computing)NGES (Towards Next, Generation Embedded Systems: Utilizing Parallelism and Reconfigurability)
Funder
Swedish Foundation for Strategic Research VINNOVA
Available from: 2017-05-09 Created: 2017-05-09 Last updated: 2018-12-05Bibliographically approved
Savas, S., Ul-Abdin, Z. & Nordström, T. (2018). Designing Domain-Specific Heterogeneous Architectures from Dataflow Programs. Computers, 7(2), Article ID 27.
Open this publication in new window or tab >>Designing Domain-Specific Heterogeneous Architectures from Dataflow Programs
2018 (English)In: Computers, ISSN 2073-431X, Vol. 7, no 2, article id 27Article in journal (Refereed) Published
Abstract [en]

The last ten years have seen performance and power requirements pushing computer architectures using only a single core towards so-called manycore systems with hundreds of cores on a single chip. To further increase performance and energy efficiency, we are now seeing the development of heterogeneous architectures with specialized and accelerated cores. However, designing these heterogeneous systems is a challenging task due to their inherent complexity. We proposed an approach for designing domain-specific heterogeneous architectures based on instruction augmentation through the integration of hardware accelerators into simple cores. These hardware accelerators were determined based on their common use among applications within a certain domain.The objective was to generate heterogeneous architectures by integrating many of these accelerated cores and connecting them with a network-on-chip. The proposed approach aimed to ease the design of heterogeneous manycore architectures—and, consequently, exploration of the design space—by automating the design steps. To evaluate our approach, we enhanced our software tool chain with a tool that can generate accelerated cores from dataflow programs. This new tool chain was evaluated with the aid of two use cases: radar signal processing and mobile baseband processing. We could achieve an approximately 4x improvement in performance, while executing complete applications on the augmented cores with a small impact (2.5–13%) on area usage. The generated accelerators are competitive, achieving more than 90% of the performance of hand-written implementations.

Place, publisher, year, edition, pages
Basel: MDPI AG, 2018
Keywords
heterogeneous architecture design, risc-v, dataflow, QR decomposition, domain-specific processor, accelerator, Autofocus, hardware software co-design
National Category
Computer Systems
Identifiers
urn:nbn:se:hh:diva-36669 (URN)10.3390/computers7020027 (DOI)
Projects
Towards Next Generation Embedded Systems: Utilizing Parallelism and Reconfigurability (NGES)
Funder
Swedish Foundation for Strategic Research VINNOVA
Available from: 2018-04-24 Created: 2018-04-24 Last updated: 2018-04-26Bibliographically approved
Savas, S., Hertz, E., Nordström, T. & Ul-Abdin, Z. (2017). Efficient Single-Precision Floating-Point Division Using Harmonized Parabolic Synthesis. In: Michael Hübner, Ricardo Reis, Mircea Stan & Nikolaos Voros (Ed.), 2017 IEEE Computer Society Annual Symposium on VLSI: ISVLSI 2017. Paper presented at IEEE Computer Society Annual Symposium on VLSI, July 3-5, 2017, Bochum, Germany. Los Alamitos: IEEE
Open this publication in new window or tab >>Efficient Single-Precision Floating-Point Division Using Harmonized Parabolic Synthesis
2017 (English)In: 2017 IEEE Computer Society Annual Symposium on VLSI: ISVLSI 2017 / [ed] Michael Hübner, Ricardo Reis, Mircea Stan & Nikolaos Voros, Los Alamitos: IEEE, 2017Conference paper, Published paper (Refereed)
Abstract [en]

This paper proposes a novel method for performing division on floating-point numbers represented in IEEE-754 single-precision (binary32) format. The method is based on an inverter, implemented as a combination of Parabolic Synthesis and second-degree interpolation, followed by a multiplier. It is implemented with and without pipeline stages individually and synthesized while targeting a Xilinx Ultrascale FPGA.

The implementations show better resource usage and latency results when compared to other implementations based on different methods. In case of throughput, the proposed method outperforms most of the other works, however, some Altera FPGAs achieve higher clock rate due to the differences in the DSP slice multiplier design.

Due to the small size, low latency and high throughput, the presented floating-point division unit is suitable for high performance embedded systems and can be integrated into accelerators or be used as a stand-alone accelerator.

Place, publisher, year, edition, pages
Los Alamitos: IEEE, 2017
Series
IEEE Computer Society Annual Symposium on VLSI, ISSN 2159-3477
Keywords
Floating-point, single precision, division, FPGA, Harmonized Parabolic Synthesis
National Category
Computer Systems
Identifiers
urn:nbn:se:hh:diva-33793 (URN)10.1109/ISVLSI.2017.28 (DOI)2-s2.0-85027258772 (Scopus ID)978-1-5090-6762-6 (ISBN)978-1-5090-6763-3 (ISBN)
Conference
IEEE Computer Society Annual Symposium on VLSI, July 3-5, 2017, Bochum, Germany
Projects
NGES
Funder
VINNOVA
Available from: 2017-05-05 Created: 2017-05-05 Last updated: 2017-12-14Bibliographically approved
Kunert, K., Jonsson, M., Böhm, A. & Nordström, T. (2017). Providing Efficient Support for Real-Time Guarantees in a Fibre-Optic AWG-Based Network for Embedded Systems. Optical Switching and Networkning Journal, 24, 47-56
Open this publication in new window or tab >>Providing Efficient Support for Real-Time Guarantees in a Fibre-Optic AWG-Based Network for Embedded Systems
2017 (English)In: Optical Switching and Networkning Journal, ISSN 1573-4277, E-ISSN 1872-9770, Vol. 24, p. 47-56Article in journal (Refereed) Published
Abstract [en]

High-performance embedded systems running real-time applications demand communication solutions providing high data rates and low error probabilities, properties inherent to optical solutions. However, providing timing guarantees for deadline bound applications in this context is far from basic due to the parallelism inherent in multiwavelength networks and often bound to include a large amount of pessimism. Assuming deterministic medium access, an admission control algorithm using a schedulability analysis can ensure deadline guarantees for real-time communication. The traffic dependency analysis presented in this paper is specifically targeting a multichannel context, taking into consideration the possibility of concurrent transmissions in these types of networks. Combining our analysis with a feasibility analysis in admission control, the amount of guaranteed hard real-time traffic could be shown to increase by a factor 7 in a network designed for a radar signal processing case. Using this combination of analysis methods will render possible an increased amount of hard real-time traffic over a given multichannel network, leading to a more efficient bandwidth utilization by deadline dependent applications without having to redesign the network or the medium access method.

Place, publisher, year, edition, pages
Amsterdam: Elsevier, 2017
Keywords
Real-time communication, Embedded system, Worst-case analysis, Multiwavelength network, Arrayed waveguide grating
National Category
Communication Systems
Identifiers
urn:nbn:se:hh:diva-32469 (URN)10.1016/j.osn.2016.11.004 (DOI)000392776500006 ()2-s2.0-85006944209 (Scopus ID)
Available from: 2016-11-22 Created: 2016-11-22 Last updated: 2018-03-23Bibliographically approved
Savas, S., Raase, S., Gebrewahid, E., Ul-Abdin, Z. & Nordström, T. (2016). Dataflow Implementation of QR Decomposition on a Manycore. In: MES '16: Proceedings of the Third ACM International Workshop on Many-core Embedded Systems. Paper presented at MES '16, International Workshop on Many-core Embedded Systems, Seoul, Republic of Korea, June 19, 2016 (pp. 26-30). New York, NY: ACM Press
Open this publication in new window or tab >>Dataflow Implementation of QR Decomposition on a Manycore
Show others...
2016 (English)In: MES '16: Proceedings of the Third ACM International Workshop on Many-core Embedded Systems, New York, NY: ACM Press, 2016, p. 26-30Conference paper, Published paper (Refereed)
Abstract [en]

While parallel computer architectures have become mainstream, application development on them is still challenging. There is a need for new tools, languages and programming models. Additionally, there is a lack of knowledge about the performance of parallel approaches of basic but important operations, such as the QR decomposition of a matrix, on current commercial manycore architectures.

This paper evaluates a high level dataflow language (CAL), a source-to-source compiler (Cal2Many) and three QR decomposition algorithms (Givens Rotations, Householder and Gram-Schmidt). The algorithms are implemented both in CAL and hand-optimized C languages, executed on Adapteva's Epiphany manycore architecture and evaluated with respect to performance, scalability and development effort.

The performance of the CAL (generated C) implementations gets as good as 2\% slower than the hand-written versions. They require an average of 25\% fewer lines of source code without significantly increasing the binary size. Development effort is reduced and debugging is significantly simplified. The implementations executed on Epiphany cores outperform the GNU scientific library on the host ARM processor of the Parallella board by up to 30x. © 2016 Copyright held by the owner/author(s).

Place, publisher, year, edition, pages
New York, NY: ACM Press, 2016
National Category
Embedded Systems
Identifiers
urn:nbn:se:hh:diva-32371 (URN)10.1145/2934495.2934499 (DOI)2-s2.0-84991106778 (Scopus ID)978-1-4503-4262-9 (ISBN)
Conference
MES '16, International Workshop on Many-core Embedded Systems, Seoul, Republic of Korea, June 19, 2016
Projects
ESCHERHiPEC
Funder
Knowledge FoundationSwedish Foundation for Strategic Research ELLIIT - The Linköping‐Lund Initiative on IT and Mobile Communications
Available from: 2016-11-04 Created: 2016-11-04 Last updated: 2018-03-23Bibliographically approved
Sámano-Robles, R., Nordström, T., Santonja, S., Rom, W. & Tovar, E. (2016). The DEWI High-Level Architecture: Guidelines for Structuring Wireless Sensor Networks in Industrial Applications. In: 2016 Eleventh International Conference on Digital Information Management (ICDIM): . Paper presented at Eleventh International Conference on Digital Information Management (ICDIM 2016), Porto, Portugal, September 19-21, 2016 (pp. 274-280). New York: IEEE
Open this publication in new window or tab >>The DEWI High-Level Architecture: Guidelines for Structuring Wireless Sensor Networks in Industrial Applications
Show others...
2016 (English)In: 2016 Eleventh International Conference on Digital Information Management (ICDIM), New York: IEEE, 2016, p. 274-280Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents the high-level architecture (HLA) of the research project DEWI (dependable embedded wireless infrastructure). The objective of this HLA is to serve as a reference for the development of industrial wireless sensor and actuator networks (WSANs) based on the concept of the DEWI Bubble. The DEWI Bubble is defined here as a high-level abstraction of an industrial WSAN with enhanced interoperability (via standardized interfaces), technology reusability, and cross-domain development. This paper details the design criteria used to define the HLA and the organization of the infrastructure internal and external to the DEWI Bubble. The description includes the different perspectives, models or views of the architecture: the entity model, the layered model, and the functional view model (including an overview of interfaces). The HLA constitutes an extension of the ISO/IEC SNRA (sensor network reference architecture) towards the support of industrial applications. To improve interoperability with existing approaches the DEWI HLA also reuses some features from other standardized technologies and architectures. The HLA will allow networks with different industrial sensor technologies to exchange information between them or with external clients via standard interfaces, thus providing a consolidated access to sensor information of different domains. This is an important aspect for smart city applications, Big Data and internet-of-things (IoT). © Copyright 2016 IEEE

Place, publisher, year, edition, pages
New York: IEEE, 2016
Keywords
Wireless sensor networks, Logic gates, Wireless communication, Interoperability, Computer architecture, Actuators, Ad hoc networks
National Category
Communication Systems Embedded Systems Computer Engineering Computer Systems
Identifiers
urn:nbn:se:hh:diva-33217 (URN)10.1109/ICDIM.2016.7829797 (DOI)000398535200045 ()2-s2.0-85014395764 (Scopus ID)978-1-5090-2641-8 (ISBN)978-1-5090-2642-5 (ISBN)
Conference
Eleventh International Conference on Digital Information Management (ICDIM 2016), Porto, Portugal, September 19-21, 2016
Projects
DEWI
Note

Funded by FCT/MEC (Fundação para a Ciência e a Tecnologia), ERDF (European Regional Development Fund) under PT2020, CISTER Research Unit (CEC/04234), and by ARTEMIS/0004/2013-JU grant nr. 621353 (DEWI, www.dewi-project.eu)

Available from: 2017-02-06 Created: 2017-02-06 Last updated: 2018-01-13Bibliographically approved
Xypolitidis, B., Shabani, R., Khanderparkar, S. V., Ul-Abdin, Z., Savas, S. & Nordström, T. (2016). Towards Architectural Design Space Exploration for Heterogeneous Manycores. In: Yiannis Cotronis, Masoud Daneshtalab & George Angelos Papadopoulos (Ed.), Proceedings: 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processin: PDP 2016. Paper presented at 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP 2016), Heraklion, Crete, Greece, 17th-19th February, 2016 (pp. 805-810). Piscataway, NJ: IEEE Computer Society
Open this publication in new window or tab >>Towards Architectural Design Space Exploration for Heterogeneous Manycores
Show others...
2016 (English)In: Proceedings: 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processin: PDP 2016 / [ed] Yiannis Cotronis, Masoud Daneshtalab & George Angelos Papadopoulos, Piscataway, NJ: IEEE Computer Society, 2016, p. 805-810Conference paper, Published paper (Refereed)
Abstract [en]

Today many of the high performance embedded processors already contain multiple processor cores and we see heterogeneous manycore architectures being proposed. Therefore it is very desirable to have a fast way to explore various heterogeneous architectures through the use of an architectural design space exploration tool, giving the designer the option to explore design alternatives before the physical implementation. In this paper, we have extended Heracles, a design space exploration tool for (homogeneous) manycore architectures, to incorporate different types of processing cores, and thus allowus to model heterogeneity. Our tool, called the Heterogeneous Heracles System (HHS), can besides the already supported MIPS core also include OpenRISC cores. The new tool retains the possibility available in Heracles to perform register transfer level (RTL) simulations of each explored architecture in Verilog as well as synthesizing it to field-programmable gate arrays (FPGAs). To facilitate the exploration of heterogeneous architectures, we have also extended the graphical user interface (GUI) to support heterogeneity. This GUI provides options to configure the types of core, core settings, memory system and network topology. Some initial results on FPGA utilization are presented from synthesizing both homogeneous and heterogeneous manycore architectures, as well as some benchmark results from both simulated and synthesized architectures.

Place, publisher, year, edition, pages
Piscataway, NJ: IEEE Computer Society, 2016
Keywords
Heterogeneous manycores, Heterogeneous Heracles, OpenRISC, Manycore architectures, Design Space Exploration
National Category
Computer Systems
Identifiers
urn:nbn:se:hh:diva-30394 (URN)10.1109/PDP.2016.79 (DOI)000381810900121 ()2-s2.0-849688208552-s2.0-84968820855 (Scopus ID)978-1-4673-8775-0 (ISBN)
Conference
24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP 2016), Heraklion, Crete, Greece, 17th-19th February, 2016
Projects
ESCHERHiPEC
Funder
Knowledge FoundationSwedish Foundation for Strategic Research
Available from: 2016-02-23 Created: 2016-02-23 Last updated: 2017-11-30Bibliographically approved
Raase, S. & Nordström, T. (2015). On the Use of a Many-core Processor for Computational Fluid Dynamics Simulations. Paper presented at International Conference On Computational Science, ICCS 2015 – Computational Science at the Gates of Nature, Reykjavík, Iceland, 1-3 June, 2015. Procedia Computer Science, 51, 1403-1412
Open this publication in new window or tab >>On the Use of a Many-core Processor for Computational Fluid Dynamics Simulations
2015 (English)In: Procedia Computer Science, ISSN 1877-0509, E-ISSN 1877-0509, Vol. 51, p. 1403-1412Article in journal (Refereed) Published
Abstract [en]

The increased availability of modern embedded many-core architectures supporting floating-point operations in hardware makes them interesting targets in traditional high performance computing areas as well. In this paper, the Lattice Boltzmann Method (LBM) from the domain of Computational Fluid Dynamics (CFD) is evaluated on Adapteva’s Epiphany many-core architecture. Although the LBM implementation shows very good scalability and high floating-point efficiency in the lattice computations, current Epiphany hardware does not provide adequate amounts of either local memory or external memory bandwidth to provide a good foundation for simulation of the large problems commonly encountered in real CFD applications.

Place, publisher, year, edition, pages
Amsterdam: Elsevier, 2015
Keywords
Many-core, Epiphany, Computational Fluid Dynamics, Lattice Boltzmann
National Category
Computer Systems
Identifiers
urn:nbn:se:hh:diva-29136 (URN)10.1016/j.procs.2015.05.348 (DOI)000373939100143 ()2-s2.0-84939142878 (Scopus ID)
Conference
International Conference On Computational Science, ICCS 2015 – Computational Science at the Gates of Nature, Reykjavík, Iceland, 1-3 June, 2015
Projects
ESCHER
Funder
Knowledge Foundation
Note

This work was supported by the ESCHER project funded by the Swedish Knowledge foundation, and Volvo Penta AB, Gothenburg, Sweden

Available from: 2015-08-10 Created: 2015-08-10 Last updated: 2018-03-22Bibliographically approved
Svensson, B., Ul-Abdin, Z., Ericsson, P. M., Åhlander, A., Hoang Bengtsson, H., Bengtsson, J., . . . Nordström, T. (2014). A Running Leap for Embedded Signal Processing to Future Parallel Platforms. In: WISE'14: Proceedings of the 2014 ACM International Workshop on Long-Term Industrial Collaboration on Software Engineering. Paper presented at ASE '14 – ACM/IEEE International Conference on Automated Software Engineering, Västerås, Sweden, September 15-19, 2014 (pp. 35-42). New York, NY: Association for Computing Machinery (ACM)
Open this publication in new window or tab >>A Running Leap for Embedded Signal Processing to Future Parallel Platforms
Show others...
2014 (English)In: WISE'14: Proceedings of the 2014 ACM International Workshop on Long-Term Industrial Collaboration on Software Engineering, New York, NY: Association for Computing Machinery (ACM), 2014, p. 35-42Conference paper, Published paper (Refereed)
Abstract [en]

This paper highlights the collaboration between industry and academia in research. It describes more than two decades of intensive development and research of new hardware and software platforms to support innovative, high-performance sensor systems with extremely high demands on embedded signal processing capability. The joint research can be seen as the run before a necessary jump to a new kind of computational platform based on parallelism. The collaboration has had several phases, starting with a focus on hardware, then on efficiency, later on software development, and finally on taking the jump and understanding the expected future. In the first part of the paper, these phases and their respective challenges and results are described. Then, in the second part, we reflect upon the motivation for collaboration between company and university, the roles of the partners, the experiences gained and the long-term effects on both sides. Copyright © 2014 ACM.

Place, publisher, year, edition, pages
New York, NY: Association for Computing Machinery (ACM), 2014
Keywords
Industry-academia collaboration, Embedded signal processing, Parallel computing platforms, Software development
National Category
Software Engineering
Identifiers
urn:nbn:se:hh:diva-27296 (URN)10.1145/2647648.2647653 (DOI)2-s2.0-84908651240 (Scopus ID)978-1-4503-3045-9 (ISBN)
Conference
ASE '14 – ACM/IEEE International Conference on Automated Software Engineering, Västerås, Sweden, September 15-19, 2014
Funder
VINNOVAKnowledge FoundationSwedish Foundation for Strategic Research
Available from: 2014-12-16 Created: 2014-12-16 Last updated: 2018-03-22Bibliographically approved
Savas, S., Gebrewahid, E., Ul-Abdin, Z., Nordström, T. & Yang, M. (2014). An Evaluation of Code Generation of Dataflow Languages on Manycore Architectures. In: RTCSA 2014: 2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications. Paper presented at RTCSA 2014, 20th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, Chongqing, China, August 20-22, 2014. Piscataway, NJ: IEEE Press, Article ID 6910501.
Open this publication in new window or tab >>An Evaluation of Code Generation of Dataflow Languages on Manycore Architectures
Show others...
2014 (English)In: RTCSA 2014: 2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications, Piscataway, NJ: IEEE Press, 2014, article id 6910501Conference paper, Published paper (Refereed)
Abstract [en]

Today computer architectures are shifting from single core to manycores due to several reasons such as performance demands, power and heat limitations. However, shifting to manycores results in additional complexities, especially with regard to efficient development of applications. Hence there is a need to raise the abstraction level of development techniques for the manycores while exposing the inherent parallelism in the applications. One promising class of programming languages is dataflow languages and in this paper we evaluate and optimize the code generation for one such language, CAL. We have also developed a communication library to support the inter-core communication.The code generation can target multiple architectures, but the results presented in this paper is focused on Adapteva's many core architecture Epiphany.We use the two-dimensional inverse discrete cosine transform (2D-IDCT) as our benchmark and compare our code generation from CAL with a hand-written implementation developed in C. Several optimizations in the code generation as well as in the communication library are described, and we have observed that the most critical optimization is reducing the number of external memory accesses. Combining all optimizations we have been able to reduce the difference in execution time between auto-generated and hand-written implementations from a factor of 4.3x down to a factor of only 1.3x. ©2014 IEEE.

Place, publisher, year, edition, pages
Piscataway, NJ: IEEE Press, 2014
Keywords
Manycore, Dataflow Languages, code generation, Actor Machine, 2D-IDCT, Epiphany, evaluation
National Category
Embedded Systems
Identifiers
urn:nbn:se:hh:diva-25649 (URN)10.1109/RTCSA.2014.6910501 (DOI)000352610400005 ()2-s2.0-84908637354 (Scopus ID)
Conference
RTCSA 2014, 20th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, Chongqing, China, August 20-22, 2014
Projects
HiPEC project
Funder
Knowledge FoundationSwedish Foundation for Strategic Research
Note

The authors would like to thank Adapteva Inc. for giving access to their software development suite and hardware board. This research is part of the CERES research program funded by the Knowledge Foundation and HiPEC project funded by Swedish Foundation for Strategic Research (SSF).

Available from: 2014-06-16 Created: 2014-06-16 Last updated: 2018-03-22Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-0562-2082

Search in DiVA

Show all publications