hh.sePublications
Change search
Link to record
Permanent link

Direct link
BETA
Publications (10 of 20) Show all publications
Hertz, E., Lai, J., Svensson, B. & Nilsson, P. (2018). The Harmonized Parabolic Synthesis Methodology for Hardware Efficient Function Generation with Full Error Control. Journal of Signal Processing Systems, 90(12), 1623-1637
Open this publication in new window or tab >>The Harmonized Parabolic Synthesis Methodology for Hardware Efficient Function Generation with Full Error Control
2018 (English)In: Journal of Signal Processing Systems, ISSN 1939-8018, E-ISSN 1939-8115, Vol. 90, no 12, p. 1623-1637Article in journal (Refereed) Published
Abstract [en]

The Harmonized Parabolic Synthesis methodology is a further development of the Parabolic Synthesis methodology for approximation of unary functions such as trigonometric functions, logarithms and the square root with moderate accuracy for ASIC implementation. These functions are extensively used in computer graphics, communication systems and many other application areas. For these high-speed applications, software solutions are in many cases not sufficient and a hardware implementation is therefore needed. The Harmonized Parabolic Synthesis methodology has two outstanding advantages: it is parallel, thus reducing the execution time, and it is based on low complexity operations, thus is simple to implement in hardware. A difference compared to other approximation methodologies is that it is a multiplicative and not additive, methodology. Compared to the Parabolic Synthesis methodologies it is possible to significantly enhance the performance in terms of reducing chip area, computation delay and power consumption. Furthermore it increases the possibility to tailor the characteristics of the error, improving conditions for subsequent calculations and the performance in design terms. To evaluate the proposed methodology, the fractional part of the logarithm has been implemented and its performance is compared to the Parabolic Synthesis methodology. The comparison is made with 15-bit resolution. The design implemented using the proposed methodology performs 3x better than the Parabolic Synthesis implementation in terms of throughput. In terms of energy consumption, the new methodology consumes 90% less. The chip area is 70% smaller than for the Parabolic Synthesis methodology. In summary, the new technology further increases the advantages of Parabolic Synthesis. © 2017 The Author(s)

Place, publisher, year, edition, pages
New York, NY: Springer, 2018
Keywords
Approximation, parabolic synthesis, unary functions, elementary functions, second-degree interpolation, arithmetic computation, look-up table, VLSI
National Category
Embedded Systems
Identifiers
urn:nbn:se:hh:diva-32480 (URN)10.1007/s11265-017-1300-4 (DOI)2-s2.0-85032329367 (Scopus ID)
Available from: 2016-11-24 Created: 2016-11-24 Last updated: 2018-10-12Bibliographically approved
Savas, S., Hertz, E., Nordström, T. & Ul-Abdin, Z. (2017). Efficient Single-Precision Floating-Point Division Using Harmonized Parabolic Synthesis. In: Michael Hübner, Ricardo Reis, Mircea Stan & Nikolaos Voros (Ed.), 2017 IEEE Computer Society Annual Symposium on VLSI: ISVLSI 2017. Paper presented at IEEE Computer Society Annual Symposium on VLSI, July 3-5, 2017, Bochum, Germany. Los Alamitos: IEEE
Open this publication in new window or tab >>Efficient Single-Precision Floating-Point Division Using Harmonized Parabolic Synthesis
2017 (English)In: 2017 IEEE Computer Society Annual Symposium on VLSI: ISVLSI 2017 / [ed] Michael Hübner, Ricardo Reis, Mircea Stan & Nikolaos Voros, Los Alamitos: IEEE, 2017Conference paper, Published paper (Refereed)
Abstract [en]

This paper proposes a novel method for performing division on floating-point numbers represented in IEEE-754 single-precision (binary32) format. The method is based on an inverter, implemented as a combination of Parabolic Synthesis and second-degree interpolation, followed by a multiplier. It is implemented with and without pipeline stages individually and synthesized while targeting a Xilinx Ultrascale FPGA.

The implementations show better resource usage and latency results when compared to other implementations based on different methods. In case of throughput, the proposed method outperforms most of the other works, however, some Altera FPGAs achieve higher clock rate due to the differences in the DSP slice multiplier design.

Due to the small size, low latency and high throughput, the presented floating-point division unit is suitable for high performance embedded systems and can be integrated into accelerators or be used as a stand-alone accelerator.

Place, publisher, year, edition, pages
Los Alamitos: IEEE, 2017
Series
IEEE Computer Society Annual Symposium on VLSI, ISSN 2159-3477
Keywords
Floating-point, single precision, division, FPGA, Harmonized Parabolic Synthesis
National Category
Computer Systems
Identifiers
urn:nbn:se:hh:diva-33793 (URN)10.1109/ISVLSI.2017.28 (DOI)2-s2.0-85027258772 (Scopus ID)978-1-5090-6762-6 (ISBN)978-1-5090-6763-3 (ISBN)
Conference
IEEE Computer Society Annual Symposium on VLSI, July 3-5, 2017, Bochum, Germany
Projects
NGES
Funder
VINNOVA
Available from: 2017-05-05 Created: 2017-05-05 Last updated: 2019-05-07Bibliographically approved
Hertz, E., Thuning, N., Bärring, L., Svensson, B. & Nilsson, P. (2016). Algorithms for implementing roots, inverse and inverse roots in hardware.
Open this publication in new window or tab >>Algorithms for implementing roots, inverse and inverse roots in hardware
Show others...
2016 (English)Manuscript (preprint) (Other academic)
Abstract [en]

In applications as in future MIMO communication systems a massive computation of complex matrix operations, such as QR decomposition, is performed. In these matrix operations, the functions roots, inverse and inverse roots are computed in large quantities. Therefore, to obtain high enough performance in such applications, efficient algorithms are highly important. Since these algorithms need to be realized in hardware it must also be ensured that they meet high requirements in terms of small chip area, low computation time and low power consumption. Power consumption is particularly important since many applications are battery powered.For most unary functions, directly applying an approximation methodology in a straightforward way will not lead to an efficient implementation. Instead, a dedicated algorithm often has to be developed. The functions roots, inverse and inverse roots are in this category. The developed approaches are founded on working in a floating-point format. For the roots functions also a change of number base is used. These procedures not only enable simpler solutions but also increased accuracy, since the approximation algorithm is performed on a mantissa of limited range.As a summarizing example the inverse square root is chosen. For comparison, the inverse square root is implemented using two methodologies: Harmonized Parabolic Synthesis and Newton-Raphson method. The novel methodology, Harmonized Parabolic Synthesis (HPS), is chosen since it has been demonstrated to provide very efficient approximations. The Newton-Raphson (NR) method is chosen since it is known for providing a very efficient implementation of the inverse square root. It is also commonly used in signal processing applications for computing approximations on fixed-point numbers of a limited range. Four implementations are made; HPS with 32 and 512 interpolation intervals and NR with 1 and 2 iterations. Summarizing the comparisons of the hardware performance, the implementations HPS 32, HPS 512 and NR 1 are comparable when it comes to hardware performance, while NR 2 is much worse. However, HPS 32 stands out in terms of better performance when it comes to the distribution of the error.

Publisher
p. 26
Keywords
Approximation, unary functions, elementary functions, arithmetic computation, root, inverse, inverse roots, harmonized parabolic synthesis, Newton-Raphson method
National Category
Embedded Systems
Identifiers
urn:nbn:se:hh:diva-30860 (URN)
Note

Som manuskript i avhandling. As manuscript in dissertation.

Available from: 2016-05-10 Created: 2016-05-10 Last updated: 2018-03-22Bibliographically approved
Hertz, E., Svensson, B. & Nilsson, P. (2016). Combining the Parabolic Synthesis Methodology with Second-Degree Interpolation. Microprocessors and microsystems, 42, 142-155
Open this publication in new window or tab >>Combining the Parabolic Synthesis Methodology with Second-Degree Interpolation
2016 (English)In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 42, p. 142-155Article in journal (Refereed) Published
Abstract [en]

The Parabolic Synthesis methodology is an approximation methodology for implementing unary functions, such as trigonometric functions, logarithms and square root, as well as binary functions, such as division, in hardware. Unary functions are extensively used in baseband for wireless/wireline communication, computer graphics, digital signal processing, robotics, astrophysics, fluid physics, games and many other areas. For high-speed applications as well as in low-power systems, software solutions are not sufficient and a hardware implementation is therefore needed. The Parabolic Synthesis methodology is a way to implement functions in hardware based on low complexity operations that are simple to implement in hardware. A difference in the Parabolic Synthesis methodology compared to many other approximation methodologies is that it is a multiplicative, in contrast to additive, methodology. To further improve the performance of Parabolic Synthesis based designs, the methodology is combined with Second-Degree Interpolation. The paper shows that the methodology provides a significant reduction in chip area, computation delay and power consumption with preserved characteristics of the error. To evaluate this, the logarithmic function was implemented, as an example, using the Parabolic Synthesis methodology in comparison to the Parabolic Synthesis methodology combined with Second-Degree Interpolation. To further demonstrate the feasibility of both methodologies, they have been compared with the CORDIC methodology. The comparison is made on the implementation of the fractional part of the logarithmic function with a 15-bit resolution. The designs implemented using the Parabolic Synthesis methodology – with and without the Second-Degree Interpolation – perform 4x and 8x better, respectively, than the CORDIC implementation in terms of throughput. In terms of energy consumption, the CORDIC implementation consumes 140% and 800% more energy, respectively. The chip area is also smaller in the case when the Parabolic Synthesis methodology combined with Second-Degree Interpolation is used. © 2016 Elsevier B.V. All rights reserved.

Place, publisher, year, edition, pages
Amsterdam: Elsevier, 2016
Keywords
Approximation, parabolic synthesis, unary functions, elementary functions, second-degree interpolation, arithmetic computation, CORDIC, VLSI, look-up table
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:hh:diva-30500 (URN)10.1016/j.micpro.2016.01.015 (DOI)000375336900011 ()2-s2.0-84960834630 (Scopus ID)
Available from: 2016-03-10 Created: 2016-03-10 Last updated: 2018-03-22Bibliographically approved
Hertz, E. (2016). Methodologies for Approximation of Unary Functions and Their Implementation in Hardware. (Doctoral dissertation). Halmstad: Halmstad University Press
Open this publication in new window or tab >>Methodologies for Approximation of Unary Functions and Their Implementation in Hardware
2016 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Applications in computer graphics, digital signal processing, communication systems, robotics, astrophysics, fluid physics and many other areas have evolved to become very computation intensive. Algorithms are becoming increasingly complex and require higher accuracy in the computations. In addition, software solutions for these applications are in many cases not sufficient in terms of performance. A hardware implementation is therefore needed. A recurring bottleneck in the algorithms is the performance of the approximations of unary functions, such as trigonometric functions, logarithms and the square root, as well as binary functions such as division. The challenge is therefore to develop a methodology for the implementation of approximations of unary functions in hardware that can cope with the growing requirements. The methodology is required to result in fast execution time, low complexity basic operations that are simple to implement in hardware, and – sincemany applications are battery powered – low power consumption. To ensure appropriate performance of the entire computation in which the approximation is a part, the characteristics and distribution of the approximation error are also things that must be possible to manage. The new approximation methodologies presented in this thesis are of the type that aims to reduce the sizes of the look-up tables by the use of auxiliary functions. They are founded on a synthesis of parabolic functions by multiplication – instead of addition, which is the most common. Three approximation methodologies have been developed; the two last being further developments of the first. For some functions, such as roots, inverse and inverse roots, a straightforward solution with an approximation is not manageable. Since these functions are frequent in many computation intensive algorithms, it is necessary to find very efficient implementations of these functions. New methods for this are also presented in this thesis. They are all founded on working in a floating-point format, and, for the roots functions, a change of number base is also used. The transformations not only enable simpler solutions but also increased accuracy, since the approximation algorithm is performed on a mantissa of limited range. Tools for error analysis have been developed as well. The characteristics and distribution of the approximation error in the new methodologies are presented and compared with existing state-of-the-art methods such as CORDIC. The verification and evaluation of the solutions have to a large extent been made as comparative ASIC implementations with other approximation methods, separately or embedded in algorithms. As an example, an implementation of the logarithm made using the third methodology developed, Harmonized Parabolic Synthesis (HPS), is compared with an implementation using the CORDIC algorithm. Both implementations are designed to provide 15-bit resolution. The design implemented using HPS performs 12 times better than the CORDIC implementation in terms of throughput. In terms of energy consumption, the new methodology consumes 96% less. The chip area is 60% smaller than for the CORDIC algorithm. In summary, the new approximation methodologies presented are found to well meet the demanding requirements that exist in this area.

Place, publisher, year, edition, pages
Halmstad: Halmstad University Press, 2016. p. 76
Series
Halmstad University Dissertations ; 21
National Category
Embedded Systems
Identifiers
urn:nbn:se:hh:diva-30983 (URN)978-91-87045-45-5 (ISBN)978-91-87045-44-8 (ISBN)
Public defence
2016-09-02, Wigforssalen, Halmstad, 13:00 (English)
Opponent
Supervisors
Available from: 2016-06-08 Created: 2016-05-31 Last updated: 2018-03-22Bibliographically approved
Nilsson, P., Sun, Y., Gangarajaiah, R. & Hertz, E. (2015). Low Power Unrolled CORDIC Architectures. In: Jim Tørresen, Snorre Aunet, Øyvind Kallevik Grutle, Ivan Ring Nielsen, Tor Sverre Lande (Ed.), 2015 Nordic Circuits and Systems Conference (NORCAS): NORCHIP & International Symposium on System-on-Chip (SoC). Paper presented at 2015 NORCAS Conference, IEEE Nordic Circuits and Systems Conference, Oslo, Norway, October 26-28, 2015. Piscataway, NJ: IEEE Press
Open this publication in new window or tab >>Low Power Unrolled CORDIC Architectures
2015 (English)In: 2015 Nordic Circuits and Systems Conference (NORCAS): NORCHIP & International Symposium on System-on-Chip (SoC) / [ed] Jim Tørresen, Snorre Aunet, Øyvind Kallevik Grutle, Ivan Ring Nielsen, Tor Sverre Lande, Piscataway, NJ: IEEE Press, 2015Conference paper, Oral presentation with published abstract (Refereed)
Abstract [en]

This paper shows a novel methodology to improve unrolled CORDIC architectures. The methodology is based on removing adder stages starting from the first stage. As an example, a 19-stage CORDIC is used but the methodology is applicable on CORDICs with an arbitrary number of stages. The CORDIC is implemented, simulated, and synthesized into hardware. In the paper, the performance is shown to be increased by 23% and that the dynamic power can be reduced by 27%. © 2014 IEEE

Place, publisher, year, edition, pages
Piscataway, NJ: IEEE Press, 2015
National Category
Embedded Systems
Identifiers
urn:nbn:se:hh:diva-29718 (URN)10.1109/NORCHIP.2015.7364396 (DOI)000380441400043 ()2-s2.0-84963742027 (Scopus ID)978-1-4673-6576-5 (ISBN)
Conference
2015 NORCAS Conference, IEEE Nordic Circuits and Systems Conference, Oslo, Norway, October 26-28, 2015
Funder
VINNOVA
Note

Funding: STMicroelectronics, SSF, and Vinnova

Available from: 2015-11-03 Created: 2015-11-03 Last updated: 2018-03-22Bibliographically approved
Nilsson, P., Shaik, A. U., Gangarajaiah, R. & Hertz, E. (2014). Hardware Implementation of the Exponential Function Using Taylor Series. In: NORCHIP 2014 – 32nd NORCHIP Conference: The Nordic Microelectronics Event. Paper presented at 32nd NORCHIP Conference, NORCHIP 2014, Tampere, Finland, October 27-28, 2014. Piscataway, NJ: IEEE Press, Article ID 7004740.
Open this publication in new window or tab >>Hardware Implementation of the Exponential Function Using Taylor Series
2014 (English)In: NORCHIP 2014 – 32nd NORCHIP Conference: The Nordic Microelectronics Event, Piscataway, NJ: IEEE Press, 2014, article id 7004740Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents hardware implementations of Taylor series. The focus will be on the exponential function but the methodology is applicable on any unary function. Two different architectures are investigated, one, original, straight forward and one modified structure. The outcomes are higher performance, lower area, and lower power consumption for the modified architecture compared to the original.

Place, publisher, year, edition, pages
Piscataway, NJ: IEEE Press, 2014
Keywords
Taylor expansion, Exponential function, Dynamic Power, Static Power, Low Leakage, Integrated Circuit, IC, ASIC, CMOS
National Category
Embedded Systems
Identifiers
urn:nbn:se:hh:diva-27591 (URN)10.1109/NORCHIP.2014.7004740 (DOI)000380487600043 ()2-s2.0-84921485655 (Scopus ID)978-1-4799-5442-1 (ISBN)
Conference
32nd NORCHIP Conference, NORCHIP 2014, Tampere, Finland, October 27-28, 2014
Projects
DARE
Funder
Swedish Foundation for Strategic Research
Available from: 2015-01-26 Created: 2015-01-26 Last updated: 2018-03-22Bibliographically approved
Pouyan, P., Hertz, E. & Nilsson, P. (2011). A VLSI Implementation of Logarithmic and Exponential Functions Using a Novel Parabolic Synthesis Methodology Compared to the CORDIC Algorithm. In: 2011 20th European Conference on Circuit Theory and Design (ECCTD): . Paper presented at The 20th European Conference on Circuit Theory and Design (ECCTD 2011), 29-31 August 2011, Linköping, Sweden. New York: Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>A VLSI Implementation of Logarithmic and Exponential Functions Using a Novel Parabolic Synthesis Methodology Compared to the CORDIC Algorithm
2011 (English)In: 2011 20th European Conference on Circuit Theory and Design (ECCTD), New York: Institute of Electrical and Electronics Engineers (IEEE), 2011Conference paper, Published paper (Refereed)
Abstract [en]

High performance implementations of unary functions are important in many applications e.g. in the wireless communication area. This paper shows the development and VLSI implementation of unary functions like the logarithmic and exponential function, by using anovel approximation methodology based on parabolic synthesis, which is compared to the well known CORDIC algorithm. Both designs are synthesized and implemented on an FPGA and as an ASIC. The results of such implementations are compared with metrics such as performance and area. The performance in the parabolic architecture is shown to exceed the CORDIC architecture by a factor 4.2, in a 65 nm Standard-VT ASIC implementation. © 2011 IEEE.

Place, publisher, year, edition, pages
New York: Institute of Electrical and Electronics Engineers (IEEE), 2011
Keywords
CORDIC, Exponential, FPGA, Logarithmic, Parabolic Synthesis, VLSI
National Category
Embedded Systems
Identifiers
urn:nbn:se:hh:diva-30982 (URN)10.1109/ECCTD.2011.6043642 (DOI)2-s2.0-80155205523 (Scopus ID)978-145770618-9 (ISBN)
Conference
The 20th European Conference on Circuit Theory and Design (ECCTD 2011), 29-31 August 2011, Linköping, Sweden
Note

Article number 6043642

Available from: 2016-05-31 Created: 2016-05-31 Last updated: 2018-03-22Bibliographically approved
Hertz, E. (2011). Parabolic Synthesis. (Licentiate dissertation). Lund: Lund University
Open this publication in new window or tab >>Parabolic Synthesis
2011 (English)Licentiate thesis, monograph (Other academic)
Abstract [en]

Many consumer products, such as within the computer areas, computer graphics, digital signal processing, communication systems, robotics, navigation, astrophysics, fluid physics, etc. are searching for high computational performance as a consequence of increasingly more advanced algorithms in these applications. Until recently the down scaling of the hardware technology has been able to fulfill these higher demands from the more advanced algorithms with higher clock rates on the chips. This that the development of hardware technology performance has stagnated has moved the interest more over to implementation of algorithms in hardware. Especially within wireless communication the desire for higher transmission rates has increased the interest for algorithm implementation methodologies. The scope of this thesis is mainly on the developed methodology of parabolic synthesis. The parabolic synthesis methodology is a methodology for implementing approximations of unary functions in hardware. The methodology is described with the criteria's that have to be fulfilled to perform an approximation on a unary function. The hardware architecture of the methodology is described and to this a special hardware that performs the squaring operation. The outcome of the presented research is a novel methodology for implementing approximations of unary functions such as trigonometric functions, logarithmic functions, as well as square root and division functions etc. The architecture of the processing part automatically gives a high degree of parallelism. The methodology is founded on operations that are simple to implement in hardware such as addition, shifts, multiplication, contributes to that the implementation in hardware is simple to perform. The hardware architecture is characterized by a high degree of parallelism that gives a short critical path and fast computation. The structure of the methodology will also assure an area efficient hardware implementation.

Place, publisher, year, edition, pages
Lund: Lund University, 2011. p. 69
Series
Series of licentiate and doctoral theses (Department of Electrical and Information Technology, Lund University), ISSN 1654-790X ; 28
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:hh:diva-22338 (URN)978-91-7473-069-2 (ISBN)
Presentation
2011-01-28, Lecture hall E:2311, E-building, Ole Römers väg 3, Faculty of Engineering, Lund University, Lund, 10:15
Opponent
Supervisors
Available from: 2013-06-04 Created: 2013-05-24 Last updated: 2018-03-22Bibliographically approved
Nilsson, P. & Hertz, E. (2011). Ultra Low Power Hardware for Computing Squared Euclidean Distances. In: 2011 20th European Conference on Circuit Theory and Design (ECCTD): . Paper presented at 20th European Conference on Circuit Theory and Design (ECCTD 2011), August 29-31, Linköping, Sweden (pp. 580-583). Piscataway, NJ: IEEE Press
Open this publication in new window or tab >>Ultra Low Power Hardware for Computing Squared Euclidean Distances
2011 (English)In: 2011 20th European Conference on Circuit Theory and Design (ECCTD), Piscataway, NJ: IEEE Press, 2011, p. 580-583Conference paper, Published paper (Refereed)
Abstract [en]

Computing Euclidean Distances is a very important operation in digital communication, especially in the case of trellis coded modulation, where it is used numerously. This paper shows that a substantial reduction in complexity can be achieved in hardware processing elements for computing Euclidean Distances. A reduction in complexity down to 39% is shown compared to traditional designs. The paper also shows that the optimized design can be done completely ripple free, which leads to a reduction of the critical path to far more than half. The reduction in complexity leads to a reduction in power consumption. The ripple free design also leads to lower power consumption for two reasons: the rippling in itself leads to unnecessary glitches, which costs power and the shorter critical path enables a lower supply voltage, which reduces the power consumption as well. © 2011 IEEE.

Place, publisher, year, edition, pages
Piscataway, NJ: IEEE Press, 2011
National Category
Engineering and Technology
Identifiers
urn:nbn:se:hh:diva-22320 (URN)10.1109/ECCTD.2011.6043600 (DOI)2-s2.0-80155192288 (Scopus ID)978-1-4577-0617-2 (ISBN)978-1-4577-0616-5 (ISBN)
Conference
20th European Conference on Circuit Theory and Design (ECCTD 2011), August 29-31, Linköping, Sweden
Funder
VINNOVASwedish Foundation for Strategic Research
Note

This work has been supported by the Swedish VINNOVA Industrial Excellence Center – System Design on Silicon and Wireless Communication for Ultra Portable Devices – Swedish Foundation for Strategic Research (SSF).

Available from: 2013-05-24 Created: 2013-05-24 Last updated: 2018-03-22Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-4828-7488

Search in DiVA

Show all publications