hh.sePublications
Change search
Refine search result
1 - 35 of 35
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the 'Create feeds' function.
  • 1.
    Alam, Ashraful
    et al.
    Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE).
    Ul-Abdin, Zain
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Svensson, Bertil
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Parallelization of the Estimation Algorithm of the 3D Structure Tensor2012In: 2012 International Conference on Reconfigurable Computing and FPGAs, ReConFig 2012 / [ed] Peter Athanas, René Cumplido & Eduardo de la Torre, Piscataway, N.J.: IEEE Press, 2012, 6416771Conference paper (Refereed)
    Abstract [en]

    The three dimensional structure tensor algorithm (3D-STA) is often used in image processing applications to compute the optical flow or to detect local 3D structures and their directions. This algorithm is computationally expensive due to many computations that are required to calculate the gradient, the tensor, and to smooth every pixel of the image frames. Therefore, it is important to parallelize the implementation to achieve high performance. In this paper we present two parallel implementations of 3D-STA; namely moderately parallelized and highly parallelized implementation, on a massively parallel reconfigurable array. Finally, we evaluate the performance of the generated code and results are compared with another optical flow implementation. The throughput achieved by the moderately parallelized implementation is approximately half of the throughput of the Optical flow implementation, whereas the highly parallelized implementation results in a 2x gain in throughput as compared to the optical flow implementation. © 2012 IEEE.

  • 2.
    Essayas, Gebrewahid
    et al.
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Yang, Mingkun
    Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE).
    Cedersjö, Gustav
    Department of Computer Science, Lund University, Lund, Sweden.
    Ul-Abdin, Zain
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Gaspes, Veronica
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Janneck, Jörn W.
    Department of Computer Science, Lund University, Lund, Sweden.
    Svensson, Bertil
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Realizing Efficient Execution of Dataflow Actors on Manycores2014In: Proceedings: 2014 International Conference on Embedded and Ubiquitous Computing: EUC 2014: August 2014, Milano, Italy / [ed] Randall Bilof, Los Alamitos, CA: IEEE Computer Society, 2014, 321-328 p., 6962305Conference paper (Refereed)
    Abstract [en]

    Embedded DSP computing is currently shifting towards manycore architectures in order to cope with the ever growing computational demands. Actor based dataflow languages are being considered as a programming model. In this paper we present a code generator for CAL, one such dataflow language. We propose to use a compilation tool with two intermediate representations. We start from a machine model of the actors that provides an ordering for testing of conditions and firing of actions. We then generate an Action Execution Intermediate Representation that is closer to a sequential imperative language like C and Java. We describe our two intermediate representations and show the feasibility and portability of our approach by compiling a CAL implementation of the Two-Dimensional Inverse Discrete Cosine Transform on a general purpose processor, on the Epiphany manycore architecture and on the Ambric massively parallel processor array. © 2014 IEEE.

  • 3.
    Gebrewahid, Essayas
    et al.
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Ali Arslan, Mehmet
    Lund University, Lund, Sweden.
    Karlsson, Andreas
    Linköping University, Linköping, Sweden.
    Ul-Abdin, Zain
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Support for Data Parallelism in the CAL Actor Language2016In: WPMVP '16: Proceedings of the 3rd Workshop on Programming Models for SIMD/Vector Processing / [ed] Jan Eitzinger, Joel Falcou, Illie Gabriel Tanase & James Brodman, New York, NY: ACM Press, 2016Conference paper (Refereed)
    Abstract [en]

    With the arrival of heterogeneous manycores comprising various features to support task, data and instruction-level parallelism, developing applications that take full advantage of the hardware parallel features has become a major challenge. In this paper, we present an extension to our CAL compilation framework (CAL2Many) that supports data parallelism in the CAL Actor Language. Our compilation framework makes it possible to program architectures with SIMD support using high-level language and provides efficient code generation. We support general SIMD instructions but the code generation backend is currently implemented for two custom architectures, namely ePUMA and EIT. Our experiments were carried out for two custom SIMD processor architectures using two applications.

    The experiment shows the possibility of achieving performance comparable to hand-written machine code with much less programming effort.

  • 4.
    Gebrewahid, Essayas
    et al.
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Ul-Abdin, Zain
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Actor Fission Transformations for Executing Dataflow Programs on Manycores2017Conference paper (Other academic)
    Abstract [en]

    Manycore architectures are dominating the development of advanced embedded computing due to the computational and power demand of high performance applications. This has introduced an additional complexity with regard to the efficient exploitation of the underlying hardware and the development of efficient parallel implementations. To tackle this we model applications using a dataflow programming language, perform high-level transformations of dataflow actors, and generate native code by using our compilation framework.This paper presents the actor fission transformations of our Cal2Many compilation framework. The transformations have facilitated the mapping of big dataflow actors on memory restricted embedded manycores, increased the utilization of the hardware, and enabled support for task and data-level parallelism. We have applied the actor transformations to two blocks of MPEG-4 decoder and executed it on the Epiphany manycore architecture. The result shows the practicality and feasibility of our approach.

  • 5.
    Gebrewahid, Essayas
    et al.
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Ul-Abdin, Zain
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Gaspes, Veronica
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Cal2Many: A Framework to Compile Dataflow Programs for Manycores2017In: Article in journal (Other academic)
    Abstract [en]

    The arrival of manycore platforms has imposed programming challenges for mainstream embedded system developers. In this paper, we discuss the significance of actor-oriented dataflow languages and present our compilation framework for CAL Actor Language that leads to increased portability and retargetability. We demonstrate the applicability of our approach with streaming applications targeting the Epiphany many-core architecture. We have performed an in-depth analysis of MPEG-4 SP implemented on Epiphany using our framework and studied the effects of actor composition. We have identified hardware aspects such as increased off-chip memory bandwidth and larger local memories that could result in further performance improvements.

  • 6.
    Gebrewahid, Essayas
    et al.
    Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Ul-Abdin, Zain
    Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Svensson, Bertil
    Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Mapping Occam-pi programs to a Manycore Architecture2011Conference paper (Refereed)
    Abstract [en]

    Efficient utilization of available resources is a key concept in embedded systems. This paper is focused on providing the support for managing dynamic reconfiguration of computing resources in the programming model. We present an approach to map occam-pi programs to a manycore architecture, Platform 2012 (P2012). We describe the techniques used to translate the salient features of the occam-pi language to the native programing model of the P2012 architecture. We present the initial results from a case study of matrix multiplication. Our results show the simplicity of occam-pi program by 6 times reduction in lines-of-code.

  • 7.
    Gebrewahid, Essayas
    et al.
    Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Ul-Abdin, Zain
    Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Svensson, Bertil
    Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Gaspes, Veronica
    Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), Halmstad Embedded and Intelligent Systems Research (EIS).
    Jego, Bruno
    ST-Microlectronics, Grenoble, France.
    Lavigueur, Bruno
    ST-Microlectronics, Grenoble, France.
    Robart, Mathieu
    ST-Microlectronics, Bristol, United Kingdom.
    Programming Real-time Image Processing for Manycores in a High-level Language2013In: Advanced Parallel Processing Technology / [ed] Wu, Chenggang and Cohen, Albert, Berlin Heidelberg: Springer Berlin/Heidelberg, 2013, 381-395 p.Conference paper (Refereed)
    Abstract [en]

    Manycore architectures are gaining attention as a means to meet the performance and power demands of high-performance embedded systems. However, their widespread adoption is sometimes constrained by the need formastering proprietary programming languages that are low-level and hinder portability. We propose the use of the concurrent programming language occam-pi as a high-level language for programming an emerging class of manycore architectures. We show how to map occam-pi programs to the manycore architecture Platform 2012 (P2012). We describe the techniques used to translate the salient features of the language to the native programming model of the P2012. We present the results from a case study on a representative algorithm in the domain of real-time image processing: a complex algorithm for corner detectioncalled Features from Accelerated Segment Test (FAST). Our results show that the occam-pi program is much shorter, is easier to adapt and has a competitive performance when compared to versions programmed in the native programming model of P2012 and in OpenCL.

  • 8.
    Islam Cheema, Fahad
    et al.
    LUMS University, DHA Phase-2, Lahore, Pakistan.
    Ul-Abdin, Zain
    Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), Halmstad Embedded and Intelligent Systems Research (EIS), Embedded Systems (CERES).
    Svensson, Bertil
    Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), Halmstad Embedded and Intelligent Systems Research (EIS), Embedded Systems (CERES).
    A Design Methodology for Resource to Performance Tradeoff Adjustment in FPGAs2010In: FPGAworld '10 Proceedings of the 7th FPGAworld Conference, New York: ACM Press, 2010, 14-19 p.Conference paper (Refereed)
    Abstract [en]

    When implementing computation-intensive algorithms on finegrained parallel architectures, adjustment of resource to performance tradeoff is a big challenge. This paper proposes a methodology for dealing with some of these performance tradeoffs by adjusting parallelism at different levels. In a case study, interpolation kernels are implemented on a fine-grained architecture (FPGA) using a high level language (Mitrion-C). For both cubic and bi-cubic interpolation, one single-kernel, one cross-kernel and two multi-kernel parallel implementations are designed and evaluated. Our results demonstrate that no single level of parallelism can be used for trade-off adjustment. Instead, the appropriate degree of parallelism on each level, according to available resources and the performance requirements of the application, needs to be found. Basing the design on high-level programming simplifies the trade-off process. This research is a step towards automation of the choice of parallelization based on a combination of parallelism levels.

  • 9.
    Olofsson, Andreas
    et al.
    Adapteva Inc., Lexington, MA, USA.
    Nordström, Tomas
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Ul-Abdin, Zain
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Kickstarting High-performance Energy-efficient Manycore Architectures with Epiphany2014In: Conference record: Forty-Eighth Asilomar Conference on Signals, Systems & Computers, November 2-5, 2014, Pacific Grove, California / [ed] Michael B. Matthews, Piscataway, NJ: IEEE Press, 2014, 1719-1726 p.Conference paper (Refereed)
    Abstract [en]

    In this paper we introduce Epiphany as a high-performance energy-efficient manycore architecture suitable for real-time embedded systems. This scalable architecture supports floating point operations in hardware and achieves 50 GFLOPS/W in 28 nm technology, making it suitable for high performance streaming applications like radio base stations and radar signal processing. Through an efficient 2D mesh Network-on-Chip and a distributed shared memory model, the architecture is scalable to thousands of cores on a single chip. An Epiphany-based open source computer named Parallella was launched in 2012 through Kickstarter crowd funding and has now shipped to thousands of customers around the world. ©2014 IEEE.

  • 10.
    Savas, Süleyman
    et al.
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Gebrewahid, Essayas
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Ul-Abdin, Zain
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Nordström, Tomas
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Yang, Mingkun
    Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE).
    An Evaluation of Code Generation of Dataflow Languages on Manycore Architectures2014In: RTCSA 2014: 2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications, Piscataway, NJ: IEEE Press, 2014, 6910501Conference paper (Refereed)
    Abstract [en]

    Today computer architectures are shifting from single core to manycores due to several reasons such as performance demands, power and heat limitations. However, shifting to manycores results in additional complexities, especially with regard to efficient development of applications. Hence there is a need to raise the abstraction level of development techniques for the manycores while exposing the inherent parallelism in the applications. One promising class of programming languages is dataflow languages and in this paper we evaluate and optimize the code generation for one such language, CAL. We have also developed a communication library to support the inter-core communication.The code generation can target multiple architectures, but the results presented in this paper is focused on Adapteva's many core architecture Epiphany.We use the two-dimensional inverse discrete cosine transform (2D-IDCT) as our benchmark and compare our code generation from CAL with a hand-written implementation developed in C. Several optimizations in the code generation as well as in the communication library are described, and we have observed that the most critical optimization is reducing the number of external memory accesses. Combining all optimizations we have been able to reduce the difference in execution time between auto-generated and hand-written implementations from a factor of 4.3x down to a factor of only 1.3x. ©2014 IEEE.

  • 11.
    Savas, Süleyman
    et al.
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Hertz, Erik
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Nordström, Tomas
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Ul-Abdin, Zain
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Efficient Single-Precision Floating-Point Division Using Harmonized Parabolic Synthesis2017In: 2017 IEEE Computer Society Annual Symposium on VLSI: ISVLSI 2017 / [ed] Michael Hübner, Ricardo Reis, Mircea Stan & Nikolaos Voros, Los Alamitos: IEEE, 2017Conference paper (Refereed)
    Abstract [en]

    This paper proposes a novel method for performing division on floating-point numbers represented in IEEE-754 single-precision (binary32) format. The method is based on an inverter, implemented as a combination of Parabolic Synthesis and second-degree interpolation, followed by a multiplier. It is implemented with and without pipeline stages individually and synthesized while targeting a Xilinx Ultrascale FPGA.

    The implementations show better resource usage and latency results when compared to other implementations based on different methods. In case of throughput, the proposed method outperforms most of the other works, however, some Altera FPGAs achieve higher clock rate due to the differences in the DSP slice multiplier design.

    Due to the small size, low latency and high throughput, the presented floating-point division unit is suitable for high performance embedded systems and can be integrated into accelerators or be used as a stand-alone accelerator.

  • 12.
    Savas, Süleyman
    et al.
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Raase, Sebastian
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Gebrewahid, Essayas
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Ul-Abdin, Zain
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Nordström, Tomas
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Dataflow Implementation of QR Decomposition on a Manycore2016In: MES '16: Proceedings of the Third ACM International Workshop on Many-core Embedded Systems, New York, NY: ACM Press, 2016, 26-30 p.Conference paper (Refereed)
    Abstract [en]

    While parallel computer architectures have become mainstream, application development on them is still challenging. There is a need for new tools, languages and programming models. Additionally, there is a lack of knowledge about the performance of parallel approaches of basic but important operations, such as the QR decomposition of a matrix, on current commercial manycore architectures.

    This paper evaluates a high level dataflow language (CAL), a source-to-source compiler (Cal2Many) and three QR decomposition algorithms (Givens Rotations, Householder and Gram-Schmidt). The algorithms are implemented both in CAL and hand-optimized C languages, executed on Adapteva's Epiphany manycore architecture and evaluated with respect to performance, scalability and development effort.

    The performance of the CAL (generated C) implementations gets as good as 2\% slower than the hand-written versions. They require an average of 25\% fewer lines of source code without significantly increasing the binary size. Development effort is reduced and debugging is significantly simplified. The implementations executed on Epiphany cores outperform the GNU scientific library on the host ARM processor of the Parallella board by up to 30x. © 2016 Copyright held by the owner/author(s).

  • 13.
    Savas, Süleyman
    et al.
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Ul-Abdin, Zain
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Nordström, Tomas
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Designing Domain Specific Heterogeneous Manycore Architectures Based on Building Blocks2017Conference paper (Refereed)
    Abstract [en]

    Performance and power requirements has pushed computer architectures from single core to manycores. These requirements now continue pushing the manycores with identical cores (homogeneous) to manycores with specialized cores (heterogeneous). However designing heterogeneous manycores is a challenging task due to the complexity of the architectures. We propose an approach for designing domain specific heterogeneous manycore architectures based on building blocks. These blocks are defined as the common computations of the applications within a domain. The objective is to generate heterogeneous architectures by integrating many of these blocks to many simple cores and connect the cores with a network-on-chip. The proposed approach aims to ease the design of heterogeneous manycore architectures and facilitate usage of dark silicon concept. As a case study, we develop an accelerator based on several building blocks, integrate it to a RISC core and synthesize on a Xilinx Ultrascale FPGA. The results show that executing a hot-spot of an application on an accelerator based on building blocks increases the performance by 15x, with room for further improvement. The area usage increases as well, however there are potential optimizations to reduce the area usage.

  • 14.
    Svensson, Bertil
    et al.
    Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), Halmstad Embedded and Intelligent Systems Research (EIS), Embedded Systems (CERES).
    Ul-Abdin, Zain
    Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), Halmstad Embedded and Intelligent Systems Research (EIS), Embedded Systems (CERES).
    Using a CSP based programming model for reconfigurable processor arrays2008In: International Conference on Reconfigurable Computing and FPGAs, 2008. ReConFig '08, Los Alamitos, California: IEEE Computer Society, 2008, 343-348 p.Conference paper (Refereed)
    Abstract [en]

    The growing trend towards adoption of flexible and heterogeneous, parallel computing architectures has increased the challenges faced by the programming community. We propose a method to program an emerging class of reconfigurable processor arrays by using the CSP based programming model of occam-pi. The paper describes the extension of an existing compiler platform to target such architectures. To evaluate the performance of the generated code, we present three implementations of the DCT algorithm. It is concluded that CSP appears to be a suitable computation model for programming a wide variety of reconfigurable architectures.

  • 15.
    Svensson, Bertil
    et al.
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Ul-Abdin, Zain
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Ericsson, Per M.
    Saab AB (EDS), Gothenburg, Sweden.
    Åhlander, Anders
    Saab AB (EDS), Gothenburg, Sweden.
    Hoang Bengtsson, Hoai
    Viktoria Swedish ICT, Gothenburg, Sweden.
    Bengtsson, Jerker
    Saab AB (EDS), Gothenburg, Sweden.
    Gaspes, Veronica
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Nordström, Tomas
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    A Running Leap for Embedded Signal Processing to Future Parallel Platforms2014In: WISE'14: Proceedings of the 2014 ACM International Workshop on Long-Term Industrial Collaboration on Software Engineering, New York, NY: Association for Computing Machinery (ACM), 2014, 35-42 p.Conference paper (Refereed)
    Abstract [en]

    This paper highlights the collaboration between industry and academia in research. It describes more than two decades of intensive development and research of new hardware and software platforms to support innovative, high-performance sensor systems with extremely high demands on embedded signal processing capability. The joint research can be seen as the run before a necessary jump to a new kind of computational platform based on parallelism. The collaboration has had several phases, starting with a focus on hardware, then on efficiency, later on software development, and finally on taking the jump and understanding the expected future. In the first part of the paper, these phases and their respective challenges and results are described. Then, in the second part, we reflect upon the motivation for collaboration between company and university, the roles of the partners, the experiences gained and the long-term effects on both sides. Copyright © 2014 ACM.

  • 16.
    Ul-Abdin, Zain
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Embedded Systems (CERES).
    High-level programming of coarse-grained reconfigurable architectures2009In: 19th International Conference on Field Programmable Logic and Applications: (FPL), proceedings, Prague, Czech Republic, August 31-September 2, 2009 / [ed] Martin Daněk, Jiři Kadlec and Brent Nelson, Piscataway, N.J.: IEEE , 2009, 713-714 p.Conference paper (Refereed)
    Abstract [en]

    We propose that, in order to meet high computational demands, the application development has to be based on suitable models of computations that will lead to scalable and reusable implementations. The models should enhance the understanding of the application and at the same time enable the developer to organize the computations so that they can be efficiently mapped to the target reconfigurable architecture. The goal of the thesis is to propose methods to program future coarse-grained reconfigurable architecttures in a productive manner in such a way as to achieve energy efficient mapping with improved performance.

  • 17.
    Ul-Abdin, Zain
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Programming of Coarse-Grained Reconfigurable Architectures2011Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Coarse-grained reconfigurable architectures, which offer massive parallelism coupled with the capability of undergoing run-time reconfiguration, are gaining attention in order to meet not only the increased computational demands of high-performance embedded systems, but also to fulfill the need of adaptability to functional requirements of the application. This thesis focuses on the programming aspects of such coarse-grained reconfigurable computing devices, including the relevant computation models that are capable of exposing different kinds of parallelism inherent in the application and the ability of these models to capture the adaptability requirements of the application. The thesis suggests the occam-pi language for programming of a broad class of coarse-grained reconfigurable architectures as an intermediate language; we call it intermediate, since we believe that the applicationprogramming is best done in a high-level domain-specific language. The salient properties of the occam-pi language are explicit concurrency with built-in mechanisms for interprocessorcommunication, provision for expressing dynamic parallelism, support for the expression of dynamic reconfigurations, and placement attributes. To evaluate the programming approach, a compiler framework was extended to support the language extensions in the occam-pi language, and backends were developed to target two different coarse-grained reconfigurable architectures. XPP and Ambric. The results on XPP reveal that the occam-pi based implementations produce comparable throughput to those of NML programs, while programming at a much higher level of abstraction than that of NML. Similarly the two occam-pi implementations of autofocus criterion calculation targeted to the Ambric platform outperform the CPU implementation by factors of 11-23. Thus, the results of the implemented case-studies suggest that the occam-pi language based approach simplifies the development of applications employing run-time reconfigurable devices without compromising the performance benefits.

  • 18.
    Ul-Abdin, Zain
    et al.
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Gebrewahid, Essayas
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Svensson, Bertil
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Managing Dynamic Reconfiguration for Fault-tolerance on a Manycore Architecture2012In: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012, New York, USA: IEEE Computer Society, 2012, 312-319 p., 6270657Conference paper (Refereed)
    Abstract [en]

    With the advent of manycore architectures comprising hundreds of processing elements, fault management has become a major challenge. We present an approach that uses the occam-pi language to manage the fault recovery mechanism on a new manycore architecture, the Platform 2012 (P2012). The approach is made possible by extending our previously developed compiler framework to compile occam-pi implementations to the P2012 architecture. We describe the techniques used to translate the salient features of the occam-pi language to the native programming model of the P2012 architecture. We demonstrate the applicability of the approach by an experimental case study, in which the DCT algorithm is implemented on a set of four processing elements. During run-time, some of the tasks are then relocated from assumed faulty processing elements to the faultless ones by means of dynamic reconfiguration of the hardware. The working of the demonstrator and the simulation results illustrate not only the feasibility of the approach but also how the use of higher-level abstractions simplifies the fault handling. © 2012 IEEE.

  • 19.
    Ul-Abdin, Zain
    et al.
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Mingkun, Yang
    Uppsala University, Uppsala, Sweden.
    A Radar Signal Processing Case Study for Dataflow Programming of Manycores2017In: Journal of Signal Processing Systems, ISSN 1939-8018, E-ISSN 1939-8115, Vol. 87, no 1, 49-62 p.Article in journal (Refereed)
    Abstract [en]

    The successful realization of next generation radar systems have high performance demands on the signal processing chain. Among these are advanced Active Electronically Scanned Array (AESA) radars in which complex calculations are to be performed on huge sets of data in real-time. Manycore architectures are designed to provide flexibility and high performance essential for such streaming applications. This paper deals with the implementation of compute-intensive parts of AESA radar signal processing chain in a high-level dataflow language; CAL. We evaluate the approach by targeting a commercial manycore architecture, Epiphany, and present our findings in terms of performance and productivity gains achieved in this case study. The comparison of the performance results with the reference sequential implementations executing on a state-of-the-art embedded processor show that we are able to achieve a speedup of 1.6x to 4.4x by using only 10 cores of Epiphany. © Springer Science+Business Media New York 2015

  • 20.
    Ul-Abdin, Zain
    et al.
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Svensson, Bertil
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    A Retargetable Compilation Framework for Heterogeneous Reconfigurable Computing2016In: ACM Transactions on Reconfigurable Technology and Systems, ISSN 1936-7406, E-ISSN 1936-7414, Vol. 9, no 4, 24Article in journal (Refereed)
    Abstract [en]

    The future trend in microprocessors for the more advanced embedded systems is focusing on massively parallel reconfigurable architectures, consisting of heterogeneous ensembles of hundreds of processing elements communicating over a reconfigurable interconnection network. However, the mastering of low-level micro-architectural details involved in programming of such massively parallel platforms becomes too cumbersome, which limits their adoption in many applications. Thus there is a dire need of an approach to produce high-performance scalable implementations that harness the computational resources of the emerging reconfigurable platforms.This paper addresses the grand challenge of accessibility of these diverse reconfigurable platforms by suggesting the use of a high-level language, occam-pi, and developing a complete design flow for building, compiling, and generating machine code for heterogeneous coarse-grained hardware. We have evaluated the approach by implementing complex industrial case studies and three common signal processing algorithms. The results of the implemented case-studies suggest that the occam-pi language based approach, because of its well-defined semantics for expressing concurrency and reconfigurability, simplifies the development of applications employing run-time reconfigurable devices. The associated compiler framework ensures portability as well as the performance benefits across heterogeneous platforms.

  • 21.
    Ul-Abdin, Zain
    et al.
    Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Svensson, Bertil
    Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    A Study of Design Efficiency with a High-Level Language for FPGAs2007In: Proceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007, Abstracts and CD-ROM, Piscataway, N.J.: IEEE Press, 2007, 1-7 p.Conference paper (Refereed)
    Abstract [en]

    Over the years reconfigurable computing devices such as FPGAs have evolved from gate-level glue logic to complex reprogrammable processing architectures. However, the tools used for mapping computations to such architectures still require the knowledge about architectural details of the target device to extract efficiency. A study of the Mobius language and tools is presented in this paper, with a focus on generated hardware performance. A number of streaming and memory-intensive applications have been developed and the results have been compared with the corresponding implementations in VHDL and a behavioral hardware description language. Based upon experimental evidences, it is concluded that Mobius, a minimal parallel processing language targeted for reconfigurable architectures, enhances productivity in terms of design time and code maintainability without considerably compromising performance and resources.

  • 22.
    Ul-Abdin, Zain
    et al.
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Svensson, Bertil
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    An Evaluation of High-Performance Embedded Processing on MPPAs2013In: Proceedings: 21st Annual International IEEE Symposium on Field-Programmable Custom Computing Machines, FCCM 2013, Los Alamitos, California: IEEE Computer Society, 2013, 235-235 p., 6546032Conference paper (Refereed)
    Abstract [en]

    Embedded signal processing is facing the challenges of increased performance as well as to achieve energy efficiency. Massively parallel processor arrays (MPPAs) consisting of tens or hundreds of processing cores offer the possibility of meeting the growing performance demand in an energy efficient way by exploiting parallelism instead of scaling the clock frequency of a single powerful processor.

    In this paper, we evaluate two variants of MPPAs by implementing a significantly large case study, namely an autofocus criterion calculation, which is a key component in modern synthetic aperture radar systems. The implementation results from the two target architectures are compared on the basis of utilized resources, performance, and energy efficiency. The Ambric implementations demonstrate the usefulness of occam-pi based high-level language approach in utilizing hundreds of processors, whereas the Epiphany implementation reveals that energy-efficiency can be improved even further by a factors of 2-3 with respect to the Ambric implementations and can be achieved at high clock speeds. © 2013 IEEE.

  • 23.
    Ul-Abdin, Zain
    et al.
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Embedded Systems (CERES).
    Svensson, Bertil
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Embedded Systems (CERES).
    Evolution in architectures and programming methodologies of coarse-grained reconfigurable computing2009In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 33, no 3, 161-178 p.Article in journal (Refereed)
    Abstract [en]

    In order to meet the increased computational demands of, e.g., multimedia applications, such as video processing in HDTV, and communication applications, such as baseband processing in telecommunication systems, the architectures of reconfigurable devices have evolved to coarse-grained compositions of functional units or program controlled processors, which are operated in a coordinated manner to improve performance and energy efficiency. In this survey we explore the field of coarse-grained reconfigurable computing on the basis of the hardware aspects of granularity, reconfigurability, and interconnection networks, and discuss the effects of these on energy related properties and scalability. We also consider the computation models that are being adopted for programming of such machines, models that expose the parallelism inherent in the application in order to achieve better performance. We classify the coarse-grained reconfigurable architectures into four categories and present some of the existing examples of these categories. Finally, we identify the emerging trends of introduction of asynchronous techniques at the architectural level and the use of nano-electronics from technological perspective in the reconfigurable computing discipline.

  • 24.
    Ul-Abdin, Zain
    et al.
    Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Svensson, Bertil
    Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Occam-pi as a High-level Language for Coarse-Grained Reconfigurable Architectures2011In: IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, Washington, USA: IEEE Computer Society, 2011, 236-243 p.Conference paper (Refereed)
    Abstract [en]

    Recently we proposed occam-pi as a high-levellanguage for programming coarse grained reconfigurable architectures. The constructs of occam-pi combine ideas from CSPand pi-calculus to facilitate expressing parallelism, communication, and reconfigurability. The feasability of this approachwas illustrated by developing a compiler framework to compile occam-pi implementations to the Ambric architecture. In this paper, we demonstrate the applicability of occam-pif or programing an array of functional units, eXtreme ProcessingPlatform (XPP). This is made possible by extending the compilerframework to target the XPP architecture, including automatic floating to fixed-point conversion. Different implementations of a FIR filter and a DCT algorithm were developed and evaluated on the basis of performance and resource consumption. The reported results reveal that the approach of using occam-pito program the category of coarse grained reconfigurable architectures appears to be promising. The resulting implementations are generally much superior to those programmed in C and comparable to those hand-coded in the low-level native language NML.

  • 25.
    Ul-Abdin, Zain
    et al.
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Svensson, Bertil
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Occam-pi for Programming of Massively Parallel Reconfigurable Architectures2012In: International Journal of Reconfigurable Computing, ISSN 1687-7195, E-ISSN 1687-7209, Vol. 2012, 504815Article in journal (Refereed)
    Abstract [en]

    Massively parallel reconfigurable architectures, which offer massive parallelism coupled with the capability of undergoing run-time reconfiguration, are gaining attention in order to meet the increased computational demands of high-performance embedded systems. We propose that the occam-pi language is used for programming of the category of massively parallel reconfigurable architectures. The salient properties of the occam-pi language are explicit concurrency with built-in mechanisms for interprocessor communication, provision for expressing dynamic parallelism, support for the expression of dynamic reconfigurations, and placement attributes. To evaluate the programming approach, a compiler framework was extended to support the language extensions in the occam-pi language and a backend was developed to target the Ambric array of processors. We present two case-studies; DCT implementation exploiting the reconfigurability feature of occam-pi and a significantly large autofocus criterion calculation based on the dynamic parallelism capability of the occam-pi language. The results of the implemented case studies suggest that the occam-pi -language-based approach simplifies the development of applications employing run-time reconfigurable devices without compromising the performance benefits. Copyright © 2012 Zain-ul-Abdin and Bertil Svensson.

  • 26.
    Ul-Abdin, Zain
    et al.
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Svensson, Bertil
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Synthetic-Aperture Radar Processing on a Manycore Architecture2012Conference paper (Refereed)
    Abstract [en]

    Synthetic-Aperture Radar (SAR) systems that are used to create high-resolution radar images from low-resolution aperture data require high computational performance. Manycore architectures are emerging to overcome the computational requirements of the complex radar signal processing.

    In this paper, we evaluate a manycore architecture namely Epiphany by implementing two significantly large case studies of fast factorized back-projection and autofocus criterion calculation, which are key components in modern synthetic aperture radar systems. The implementation results from the two case studies are compared on the basis of utilized resources and performance. The Epiphany implementations demonstrate the usefulness of the architecture for the streaming algorithm (autofocus criterion calculation) by achieving speedup of 8.9x with respect to the sequential implementation on Intel Core i7 processor while operating at a lower clock speed. On the other hand, for the memory-intensive algorithm (fast factorized back-projection), the Epiphany architecture shows moderate speedup in the order of 4.25x. The Epiphany implementations of the two algorithms are, respectively, 38x and 78x more energy-efficient.

  • 27.
    Ul-Abdin, Zain
    et al.
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Svensson, Bertil
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Towards Teaching Embedded Parallel Computing: An Analytical Approach2015In: Workshop on Computer Architecture Education, WCAE 2015, 2015Conference paper (Refereed)
    Abstract [en]

    Embedded electronic systems are finding increased applications in our daily life. In order to meet the application demands in embedded systems, parallel computing is used. This paper emphasizes teaching of the specific issues of parallel computing that are critical to embedded systems. We propose an analytical approach to deliver declarative and functioning knowledge for learning in the field of computer science and engineering with a special focus on Embedded Parallel Computing (EPC). We describe the teaching of a course with a focus on how parallel computing can be used to enhance performance and improve energy efficiency of embedded systems. The teaching methods include interactive lectures with web-based course literature, seminars, and lab exercises and home-assigned practical tasks. Further, the course is intended to give a general insight into current research and development in regard to parallel architectures and computation models. Since the course is an advanced level course, the students are expected to have a basic knowledge about the fundamentals of computer architecture and their common programming methodologies. The course puts emphasis on hands-on experience with embedded parallel computing. Therefore it includes an extensive laboratory and project part, in which a state of the art manycore embedded computing system is used. We believe that undertaking these methods in succession will prepare the students for both research as well as professional career. © 2015 ACM.

  • 28.
    Ul-Abdin, Zain
    et al.
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Yang, Mingkun
    Uppsala University, Uppsala, Sweden.
    Dataflow Programming of Real-time Radar Signal Processing on Manycores2014In: 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Piscataway, NJ: IEEE Press, 2014, 15-19 p.Conference paper (Refereed)
    Abstract [en]

    Real-time performance is critical for the successful realization of next generation radar systems. Among these are advanced Active Electronically Scanned Array (AESA) radars in which complex calculations are to be performed on huge sets of data in real-time. Manycore architectures are designed to provide flexibility and high performance essential for such streaming applications.

    This paper deals with the implementation of compute-intensive parts of AESA radar signal processing chain in a high-level dataflow language; CAL. We evaluate the approach by targeting a commercial manycore architecture, Epiphany, and present our preliminary findings in terms of performance and productivity gains achieved in this case study. © 2014 IEEE.

  • 29.
    Ul-Abdin, Zain
    et al.
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Åhlander, Anders
    Saab AB, Gothenburg, Sweden.
    Svensson, Bertil
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Energy-Efficient Synthetic-Aperture Radar Processing on a Manycore Architecture2013In: Proceedings: International Conference on Parallel Processing : The 42nd Annual Conference : ICPP 2013 : 1-4 October 2013 : Lyon, France / [ed] Randall Bilof, Piscataway, NJ: IEEE conference proceedings, 2013, 330-338 p., 6687366Conference paper (Refereed)
    Abstract [en]

    The next generation radar systems have high performance demands on the signal processing chain. Examples include the advanced image creating sensor systems in which complex calculations are to be performed on huge sets of data in realtime. Manycore architectures are gaining attention as a means to overcome the computational requirements of the complex radar signal processing by exploiting massive parallelism inherent in the algorithms in an energy efficient manner.

    In this paper, we evaluate a manycore architecture, namely a 16-core Epiphany processor, by implementing two significantly large case studies, viz. an autofocus criterion calculation and the fast factorized back-projection algorithm, both key componentsin modern synthetic aperture radar systems. The implementation results from the two case studies are compared on the basis of achieved performance and programmability. One of the Epiphany implementations demonstrates the usefulness of the architecture for the streaming based algorithm (the autofocus criterion calculation) by achieving a speedup of 8.9x over a sequential implementation on a state-of-the-art general-purpose processor of a later silicon technology generation and operating at a 2.7x higher clock speed. On the other case study, a highly memory-intensive algorithm (fast factorized backprojection), the Epiphany architecture shows a speedup of 4.25x. For embedded signal processing, low power dissipation is equally important as computational performance. In our case studies, the Epiphany implementations of the two algorithms are, respectively, 78x and 38x more energy efficient. © 2013 IEEE

  • 30.
    Ul-Abdin, Zain
    et al.
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Åhlander, Anders
    Saab AB, Gothenburg, Sweden.
    Svensson, Bertil
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Real-time Radar Signal Processing on Massively Parallel Processor Arrays2013In: Conference Record of The Forty-Seventh Asilomar Conference on Signals, Systems & Computers: November 3–6, 2013 Pacific Grove, California / [ed] Michael B. Matthews, Piscataway, NJ: IEEE Signal Processing Society, 2013, 1810-1814 p.Conference paper (Refereed)
    Abstract [en]

    The next generation radar systems have high performance demands on the signal processing chain. Among these are advanced image creating sensor systems in which complex calculations are to be performed on huge sets of data in realtime. Massively Parallel Processor Arrays (MPPAs) are gaining attention to cope with the computational requirements of complex radar signal processing by exploiting the massive parallelism inherent in the algorithms in an energy efficient manner.

    In this paper, we evaluate two such massively parallel architectures, namely, Ambric and Epiphany, by implementing a significantly large case study of autofocus criterion calculation, which is a key component in future synthetic aperture radar systems. The implementation results from the two case studies are compared on the basis of achieved performance, energy efficiency, and programmability. ©2013 IEEE.

  • 31.
    Xypolitidis, Benard
    et al.
    Halmstad University, School of Information Technology.
    Shabani, Rudin
    Halmstad University, School of Information Technology.
    Khanderparkar, Satej V.
    Halmstad University, School of Information Technology.
    Ul-Abdin, Zain
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Savas, Süleyman
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Nordström, Tomas
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Towards Architectural Design Space Exploration for Heterogeneous Manycores2016In: Proceedings: 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processin: PDP 2016 / [ed] Yiannis Cotronis, Masoud Daneshtalab & George Angelos Papadopoulos, Piscataway, NJ: IEEE Computer Society, 2016, 805-810 p.Conference paper (Refereed)
    Abstract [en]

    Today many of the high performance embedded processors already contain multiple processor cores and we see heterogeneous manycore architectures being proposed. Therefore it is very desirable to have a fast way to explore various heterogeneous architectures through the use of an architectural design space exploration tool, giving the designer the option to explore design alternatives before the physical implementation. In this paper, we have extended Heracles, a design space exploration tool for (homogeneous) manycore architectures, to incorporate different types of processing cores, and thus allowus to model heterogeneity. Our tool, called the Heterogeneous Heracles System (HHS), can besides the already supported MIPS core also include OpenRISC cores. The new tool retains the possibility available in Heracles to perform register transfer level (RTL) simulations of each explored architecture in Verilog as well as synthesizing it to field-programmable gate arrays (FPGAs). To facilitate the exploration of heterogeneous architectures, we have also extended the graphical user interface (GUI) to support heterogeneity. This GUI provides options to configure the types of core, core settings, memory system and network topology. Some initial results on FPGA utilization are presented from synthesizing both homogeneous and heterogeneous manycore architectures, as well as some benchmark results from both simulated and synthesized architectures.

  • 32.
    Yang, Mingkun
    et al.
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Savas, Suleyman
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Ul-Abdin, Zain
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Nordström, Tomas
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS).
    A Communication Library for Mapping Dataflow Applications on Manycore Architectures2013In: Proceedings of the 6th Swedish Multicore Computing Workshop / [ed] Tomas Nordstrom & Zain-ul-Abdin, 2013, 65-68 p.Conference paper (Refereed)
    Abstract [en]

    Dataflow programming is a promising paradigm for high performance embedded parallel computing. When mapping a dataflow program onto a manycore architecture a key component is the library to express the communication between the actors. In this paper we present a dataflow communication library supporting the CAL actor language. A first implementation of the communication library is created for Adapteva’s manycore architecture Epiphany that contains an onchip 2-D mesh network. Three different buffering methods, with and without direct memory access (DMA) transfer, have been implemented and evaluated. We have also made a preliminary study on the effect of mapping strategies of the actors onto the cores. The assessment of the library is based on a CAL implementation of a two dimensional inverse discrete cosine transform (2D-IDCT) and our own CAL-to-C compilation framework. As expected the results show that the most efficient actor to-core mapping strategy is to keep the communication to the nearest neighbor communication pattern as much as possible. Thus, the best way to place a pipelined sequence of computations like our 2D-IDCT is to place the actors into cores in a serpentine fashion. For this application we found that the simple receiver side buffer outperforms the more complicated buffering strategies that used DMA transfer.

  • 33.
    Zain-ul-Abdin,
    et al.
    Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), Halmstad Embedded and Intelligent Systems Research (EIS), Embedded Systems (CERES).
    Svensson, Bertil
    Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), Halmstad Embedded and Intelligent Systems Research (EIS), Embedded Systems (CERES).
    Compiling Stream-Language Applications to a Reconfigurable Array Processor2005In: ERSA'05: proceedings of the 2005 International Conference on Engineering of Reconfigurable Systems and Algorithms, Las Vegas, Nevada, USA, June 27-30, 2005 / [ed] Toomas P. Plaks and R. DeMara, Las Vegas: CSREA Press, 2005, 274-275 p.Conference paper (Refereed)
  • 34.
    Zain-ul-Abdin,
    et al.
    Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), Halmstad Embedded and Intelligent Systems Research (EIS), Embedded Systems (CERES).
    Svensson, Bertil
    Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), Halmstad Embedded and Intelligent Systems Research (EIS), Embedded Systems (CERES).
    Specifying Run-time Reconfiguration in Processor Arrays using High-level language2010In: WRC 2010: 4th HiPEAC Workshop on Reconfigurable Computing, Pisa, 2010, 1-10 p.Conference paper (Refereed)
    Abstract [en]

    The adoption of run-time reconfigurable parallel architectures for high-performance embedded systems is constrained by the lackof a unified programming model which can express both parallelism and reconfigurability. We propose to program an emerging class of reconfigurable processor arrays by using the programming model of occam-pi and describe how the extensions of channel direction specifiers, mobile data, dynamic process invocation, and process placement attributes can be used to express run-time reconfiguration in occam-pi. We present implementations of DCT algorithm to demonstrate the applicability of occam-pi to express reconfigurability. We concluded that occam-pi appears to be a suitable programming model for programming run-time reconfigurable processor arrays.

  • 35.
    Zain-ul-Abdin,
    et al.
    Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Åhlander, Anders
    Business Area Electronic Defence Systems, Saab AB, Gothenburg, Sweden.
    Svensson, Bertil
    Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
    Programming Real-time Autofocus on a Massively Parallel Reconfigurable Architecture using Occam-pi2011In: Proceedings of the 19th Annual IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM'2011), Los Alamitos, Calif.: IEEE Computer Society, 2011, 194-201 p.Conference paper (Refereed)
    Abstract [en]

    Recently we proposed occam-pi as a high-level language for programming massively parallel reconfigurable architectures. The design of occam-pi incorporates ideas from CSP and pi-calculus to facilitate expressing parallelism and reconfigurability. The feasability of this approach was illustratedby building three occam-pi implementations of DCT executing on an Ambric. However, because DCT is a simple and well studied algorithm it remained uncertain whether occam-pi would also be effective for programming novel, more complex algorithms.

    In this paper, we demonstrate the applicability of occam-pi for expressing various degrees of parallelism by implementinga significantly large case-study of focus criterion calculation inan autofocus algorithm on the Ambric architecture. Autofocus is a key component of synthetic aperture radar systems. Two implementations of focus criterion calculation were developedand evaluated on the basis of performance. The comparison of the performance results with a single threaded software implementation of the same algorithm show that the throughput of the two implementations are 11x and 23x higher than the sequential implementation despite a much lower (9x) clock frequency. The two designs are, respectively, 29x and 40x moreenergy efficient.

1 - 35 of 35
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf