hh.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Algorithms for implementing roots, inverse and inverse roots in hardware
Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).ORCID iD: 0000-0003-4828-7488
Lund University, Lund, Sweden.
Lund University, Lund, Sweden.
Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).ORCID iD: 0000-0001-6625-6533
Show others and affiliations
2016 (English)Manuscript (preprint) (Other academic)
Abstract [en]

In applications as in future MIMO communication systems a massive computation of complex matrix operations, such as QR decomposition, is performed. In these matrix operations, the functions roots, inverse and inverse roots are computed in large quantities. Therefore, to obtain high enough performance in such applications, efficient algorithms are highly important. Since these algorithms need to be realized in hardware it must also be ensured that they meet high requirements in terms of small chip area, low computation time and low power consumption. Power consumption is particularly important since many applications are battery powered.For most unary functions, directly applying an approximation methodology in a straightforward way will not lead to an efficient implementation. Instead, a dedicated algorithm often has to be developed. The functions roots, inverse and inverse roots are in this category. The developed approaches are founded on working in a floating-point format. For the roots functions also a change of number base is used. These procedures not only enable simpler solutions but also increased accuracy, since the approximation algorithm is performed on a mantissa of limited range.As a summarizing example the inverse square root is chosen. For comparison, the inverse square root is implemented using two methodologies: Harmonized Parabolic Synthesis and Newton-Raphson method. The novel methodology, Harmonized Parabolic Synthesis (HPS), is chosen since it has been demonstrated to provide very efficient approximations. The Newton-Raphson (NR) method is chosen since it is known for providing a very efficient implementation of the inverse square root. It is also commonly used in signal processing applications for computing approximations on fixed-point numbers of a limited range. Four implementations are made; HPS with 32 and 512 interpolation intervals and NR with 1 and 2 iterations. Summarizing the comparisons of the hardware performance, the implementations HPS 32, HPS 512 and NR 1 are comparable when it comes to hardware performance, while NR 2 is much worse. However, HPS 32 stands out in terms of better performance when it comes to the distribution of the error.

Place, publisher, year, edition, pages
2016. , p. 26
Keywords [en]
Approximation, unary functions, elementary functions, arithmetic computation, root, inverse, inverse roots, harmonized parabolic synthesis, Newton-Raphson method
National Category
Embedded Systems
Identifiers
URN: urn:nbn:se:hh:diva-30860OAI: oai:DiVA.org:hh-30860DiVA, id: diva2:927014
Note

Som manuskript i avhandling. As manuscript in dissertation.

Available from: 2016-05-10 Created: 2016-05-10 Last updated: 2018-03-22Bibliographically approved
In thesis
1. Methodologies for Approximation of Unary Functions and Their Implementation in Hardware
Open this publication in new window or tab >>Methodologies for Approximation of Unary Functions and Their Implementation in Hardware
2016 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Applications in computer graphics, digital signal processing, communication systems, robotics, astrophysics, fluid physics and many other areas have evolved to become very computation intensive. Algorithms are becoming increasingly complex and require higher accuracy in the computations. In addition, software solutions for these applications are in many cases not sufficient in terms of performance. A hardware implementation is therefore needed. A recurring bottleneck in the algorithms is the performance of the approximations of unary functions, such as trigonometric functions, logarithms and the square root, as well as binary functions such as division. The challenge is therefore to develop a methodology for the implementation of approximations of unary functions in hardware that can cope with the growing requirements. The methodology is required to result in fast execution time, low complexity basic operations that are simple to implement in hardware, and – sincemany applications are battery powered – low power consumption. To ensure appropriate performance of the entire computation in which the approximation is a part, the characteristics and distribution of the approximation error are also things that must be possible to manage. The new approximation methodologies presented in this thesis are of the type that aims to reduce the sizes of the look-up tables by the use of auxiliary functions. They are founded on a synthesis of parabolic functions by multiplication – instead of addition, which is the most common. Three approximation methodologies have been developed; the two last being further developments of the first. For some functions, such as roots, inverse and inverse roots, a straightforward solution with an approximation is not manageable. Since these functions are frequent in many computation intensive algorithms, it is necessary to find very efficient implementations of these functions. New methods for this are also presented in this thesis. They are all founded on working in a floating-point format, and, for the roots functions, a change of number base is also used. The transformations not only enable simpler solutions but also increased accuracy, since the approximation algorithm is performed on a mantissa of limited range. Tools for error analysis have been developed as well. The characteristics and distribution of the approximation error in the new methodologies are presented and compared with existing state-of-the-art methods such as CORDIC. The verification and evaluation of the solutions have to a large extent been made as comparative ASIC implementations with other approximation methods, separately or embedded in algorithms. As an example, an implementation of the logarithm made using the third methodology developed, Harmonized Parabolic Synthesis (HPS), is compared with an implementation using the CORDIC algorithm. Both implementations are designed to provide 15-bit resolution. The design implemented using HPS performs 12 times better than the CORDIC implementation in terms of throughput. In terms of energy consumption, the new methodology consumes 96% less. The chip area is 60% smaller than for the CORDIC algorithm. In summary, the new approximation methodologies presented are found to well meet the demanding requirements that exist in this area.

Place, publisher, year, edition, pages
Halmstad: Halmstad University Press, 2016. p. 76
Series
Halmstad University Dissertations ; 21
National Category
Embedded Systems
Identifiers
urn:nbn:se:hh:diva-30983 (URN)978-91-87045-45-5 (ISBN)978-91-87045-44-8 (ISBN)
Public defence
2016-09-02, Wigforssalen, Halmstad, 13:00 (English)
Opponent
Supervisors
Available from: 2016-06-08 Created: 2016-05-31 Last updated: 2021-05-11Bibliographically approved

Open Access in DiVA

fulltext(1057 kB)2501 downloads
File information
File name FULLTEXT02.pdfFile size 1057 kBChecksum SHA-512
ae08cd19cdcf040c02635ca538611d3d3ec2d3cb4fb8fd7abc44e271cb747d55591c259826bf5e6df4aef90a7a4d6a2f3192b5adb1789b49a5430e71978bf25f
Type fulltextMimetype application/pdf

Authority records

Hertz, ErikSvensson, Bertil

Search in DiVA

By author/editor
Hertz, ErikSvensson, Bertil
By organisation
Centre for Research on Embedded Systems (CERES)
Embedded Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 2502 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 802 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf