The present invention concerns a device for conversion of a binary floating-point number into a binary fix-point 2-logarithm number or the opposite. This is done in the case of conversion of a binary floating-point number by making the invention include an input register where the floating-point number is stored, an output register for the calculated logarithm, a device that transfers the exponent of the floating-point number from the input register to the output register, where it directly forms the characteristic of the logarithm, a device that transfers the fractional part of the mantissa of the floating-point number from the input register to an adder and also to one or more part circuits that forms additional parts, a device that transfers the additional parts to the adder, said adder that adds the fractional part of the mantissa of the floating-point number and said additional parts and a device that transfers the sum from the adder to the output register where it forms the fractional part of 2-logarithm. Further, the part circuit or part circuits is arranged to be able to use different scale factors in different computing intervals. The conversion from a logarithm is carried out in a similar manner and with the same components.
The present invention concerns a device for conversion of a binary floating-point number into a binary fix-point 2-logarithm number or the opposite. This is done in the case of conversion of a binary floating-point number by making the invention include an input register where the floating-point number is stored, an output register for the calculated logarithm, a device that transfers the exponent of the floating-point number from the input register to the output register, where it directly forms the characteristic of the logarithm, a device that transfers the fractional part of the mantissa of the floating-point number from the input register to an adder and also to one or more part circuits that form additional parts, a device that transfers the additional parts to the adder, said adder that adds the fractional part of the mantissa of the floating-point number and said additional parts and a device that transfers the sum from the adder to the output register where it forms the fractional part of 2-logarithm. The conversion from logarithm is carried out in a similar manner and with the same components.
Applications in computer graphics, digital signal processing, communication systems, robotics, astrophysics, fluid physics and many other areas have evolved to become very computation intensive. Algorithms are becoming increasingly complex and require higher accuracy in the computations. In addition, software solutions for these applications are in many cases not sufficient in terms of performance. A hardware implementation is therefore needed. A recurring bottleneck in the algorithms is the performance of the approximations of unary functions, such as trigonometric functions, logarithms and the square root, as well as binary functions such as division. The challenge is therefore to develop a methodology for the implementation of approximations of unary functions in hardware that can cope with the growing requirements. The methodology is required to result in fast execution time, low complexity basic operations that are simple to implement in hardware, and – sincemany applications are battery powered – low power consumption. To ensure appropriate performance of the entire computation in which the approximation is a part, the characteristics and distribution of the approximation error are also things that must be possible to manage. The new approximation methodologies presented in this thesis are of the type that aims to reduce the sizes of the look-up tables by the use of auxiliary functions. They are founded on a synthesis of parabolic functions by multiplication – instead of addition, which is the most common. Three approximation methodologies have been developed; the two last being further developments of the first. For some functions, such as roots, inverse and inverse roots, a straightforward solution with an approximation is not manageable. Since these functions are frequent in many computation intensive algorithms, it is necessary to find very efficient implementations of these functions. New methods for this are also presented in this thesis. They are all founded on working in a floating-point format, and, for the roots functions, a change of number base is also used. The transformations not only enable simpler solutions but also increased accuracy, since the approximation algorithm is performed on a mantissa of limited range. Tools for error analysis have been developed as well. The characteristics and distribution of the approximation error in the new methodologies are presented and compared with existing state-of-the-art methods such as CORDIC. The verification and evaluation of the solutions have to a large extent been made as comparative ASIC implementations with other approximation methods, separately or embedded in algorithms. As an example, an implementation of the logarithm made using the third methodology developed, Harmonized Parabolic Synthesis (HPS), is compared with an implementation using the CORDIC algorithm. Both implementations are designed to provide 15-bit resolution. The design implemented using HPS performs 12 times better than the CORDIC implementation in terms of throughput. In terms of energy consumption, the new methodology consumes 96% less. The chip area is 60% smaller than for the CORDIC algorithm. In summary, the new approximation methodologies presented are found to well meet the demanding requirements that exist in this area.
Many consumer products, such as within the computer areas, computer graphics, digital signal processing, communication systems, robotics, navigation, astrophysics, fluid physics, etc. are searching for high computational performance as a consequence of increasingly more advanced algorithms in these applications. Until recently the down scaling of the hardware technology has been able to fulfill these higher demands from the more advanced algorithms with higher clock rates on the chips. This that the development of hardware technology performance has stagnated has moved the interest more over to implementation of algorithms in hardware. Especially within wireless communication the desire for higher transmission rates has increased the interest for algorithm implementation methodologies. The scope of this thesis is mainly on the developed methodology of parabolic synthesis. The parabolic synthesis methodology is a methodology for implementing approximations of unary functions in hardware. The methodology is described with the criteria's that have to be fulfilled to perform an approximation on a unary function. The hardware architecture of the methodology is described and to this a special hardware that performs the squaring operation. The outcome of the presented research is a novel methodology for implementing approximations of unary functions such as trigonometric functions, logarithmic functions, as well as square root and division functions etc. The architecture of the processing part automatically gives a high degree of parallelism. The methodology is founded on operations that are simple to implement in hardware such as addition, shifts, multiplication, contributes to that the implementation in hardware is simple to perform. The hardware architecture is characterized by a high degree of parallelism that gives a short critical path and fast computation. The structure of the methodology will also assure an area efficient hardware implementation.
There is disclosed a radio receiver, and a filter (24) which may be used in a radio receiver, in which received signals are applied to a digital filter twice, with an intermediate time reversal. This has the effect that any phase distortion introduced by the filter is cancelled by the application of the time inverted signal to the filter. This means that a non-linear filter can be used. Specifically, in preferred embodiments of the invention, an IIR wave digital filter (40) can be used, which means that the device can have lower power consumption and requires a smaller silicon area than would otherwise be the case.
A digital filtering arrangement 24 which may be used in a radio receiver comprises a filter such as an IIR wave digital filter 40 and means for performing a time reversal of the signal such as a last in first out (LIFO) memory 42. In use, digital input signals are applied to the filter twice with an intermediate time reversal, having the effect of cancelling any phase distortion introduced by the filter and allowing a non-linear filter to be used enabling a reduction in power consumption and silicon area use. A second matched digital filter may be used instead of applying the signal to the same filter twice.
The Harmonized Parabolic Synthesis methodology is a further development of the Parabolic Synthesis methodology for approximation of unary functions such as trigonometric functions, logarithms and the square root, as well as binary functions such as division, in hardware.These functions are extensively used in computer graphics, digital signal processing, communication systems, robotics, astrophysics, fluid physics and many other application areas. For these high-speed applications, software solutions are in many cases not sufficient and a hardware implementation is therefore needed. The Harmonized Parabolic Synthesis methodology has two outstanding advantages: it is parallel, thus reducing the execution time, and it is based on low 2complexity operations, thus is simple to implement in hardware. A notable difference in the Harmonized Parabolic Synthesis methodology compared to many other approximation methodologies is that it is a multiplicative and not an additive methodology. Without harming the favorable distribution of the approximation error presented in earlier described Parabolic Synthesis methodologies it is possible to significantly enhances the performance of the Harmonized Parabolic Synthesis methodology, in terms of reducing chip area, computation delay and power consumption. Furthermore it increases the possibility to tailor the characteristics of the error, which improves the conditions for subsequent calculations. It also extends the set of unary functions that approximations can be performed upon since the possibilities to elaborate with the characteristics and distribution of the error increases. To evaluate the proposed methodology, the fractional part of the logarithm has been implemented and its performance is compared to the Parabolic Synthesis methodology. The comparison is made with 15-bit resolution. The design implemented using the Harmonized Parabolic Synthesis methodology performs 3x better than the Parabolic Synthesis implementation in terms of throughput. In terms of energy consumption, the new methodology consumes 90% less. The chip area is 70% smaller than for the Parabolic Synthesis methodology. In summary, the new technology presented in this paper further increases the advantages of Parabolic Synthesis.
The Harmonized Parabolic Synthesis methodology is a further development of the Parabolic Synthesis methodology for approximation of unary functions such as trigonometric functions, logarithms and the square root with moderate accuracy for ASIC implementation. These functions are extensively used in computer graphics, communication systems and many other application areas. For these high-speed applications, software solutions are in many cases not sufficient and a hardware implementation is therefore needed. The Harmonized Parabolic Synthesis methodology has two outstanding advantages: it is parallel, thus reducing the execution time, and it is based on low complexity operations, thus is simple to implement in hardware. A difference compared to other approximation methodologies is that it is a multiplicative and not additive, methodology. Compared to the Parabolic Synthesis methodologies it is possible to significantly enhance the performance in terms of reducing chip area, computation delay and power consumption. Furthermore it increases the possibility to tailor the characteristics of the error, improving conditions for subsequent calculations and the performance in design terms. To evaluate the proposed methodology, the fractional part of the logarithm has been implemented and its performance is compared to the Parabolic Synthesis methodology. The comparison is made with 15-bit resolution. The design implemented using the proposed methodology performs 3x better than the Parabolic Synthesis implementation in terms of throughput. In terms of energy consumption, the new methodology consumes 90% less. The chip area is 70% smaller than for the Parabolic Synthesis methodology. In summary, the new technology further increases the advantages of Parabolic Synthesis. © 2017 The Author(s)
This paper introduces a parabolic synthesis methodology for developing approximations of unary functions like trigonometric functions and logarithms which are specialized for efficient hardware mapped VLSI design. The advantages with the methodology are, short critical path, fast computation and high throughput enabled by a high degree of architectural parallelism. The feasibility of the methodology is shown by developing an approximation of the sine function for implementation in hardware. © 2008 IEEE
This paper introduces a parabolic synthesis methodology for developing approximations of unary functions. Examples are trigonometric functions and logarithms as well as square root and division functions. They are extensively used and specialized for efficient hardware mapped VLSI design. The advantages with the methodology are, short critical path, fast computation and high throughput enabled by a high degree of architectural parallelism. The feasibility of the methodology is shown by developing an approximation of the sine function for implementation in hardware.
This paper introduces a parabolic synthesis methodology for implementation of approximations of unary functions like trigonometric functions and logarithms, which are specialized for efficient hardware mapped VLSI design. The advantages with the methodology are, short critical path, fast computation and high throughput enabled by a high degree of architectural parallelism. The feasibility of the methodology is shown by developing an approximation of the sine function for implementation in hardware. ©2009 IEEE.
The Parabolic Synthesis methodology is an approximation methodology for implementing unary functions, such as trigonometric functions, logarithms and square root, as well as binary functions, such as division, in hardware. Unary functions are extensively used in baseband for wireless/wireline communication, computer graphics, digital signal processing, robotics, astrophysics, fluid physics, games and many other areas. For high-speed applications as well as in low-power systems, software solutions are not sufficient and a hardware implementation is therefore needed. The Parabolic Synthesis methodology is a way to implement functions in hardware based on low complexity operations that are simple to implement in hardware. A difference in the Parabolic Synthesis methodology compared to many other approximation methodologies is that it is a multiplicative, in contrast to additive, methodology. To further improve the performance of Parabolic Synthesis based designs, the methodology is combined with Second-Degree Interpolation. The paper shows that the methodology provides a significant reduction in chip area, computation delay and power consumption with preserved characteristics of the error. To evaluate this, the logarithmic function was implemented, as an example, using the Parabolic Synthesis methodology in comparison to the Parabolic Synthesis methodology combined with Second-Degree Interpolation. To further demonstrate the feasibility of both methodologies, they have been compared with the CORDIC methodology. The comparison is made on the implementation of the fractional part of the logarithmic function with a 15-bit resolution. The designs implemented using the Parabolic Synthesis methodology – with and without the Second-Degree Interpolation – perform 4x and 8x better, respectively, than the CORDIC implementation in terms of throughput. In terms of energy consumption, the CORDIC implementation consumes 140% and 800% more energy, respectively. The chip area is also smaller in the case when the Parabolic Synthesis methodology combined with Second-Degree Interpolation is used. © 2016 Elsevier B.V. All rights reserved.
In applications as in future MIMO communication systems a massive computation of complex matrix operations, such as QR decomposition, is performed. In these matrix operations, the functions roots, inverse and inverse roots are computed in large quantities. Therefore, to obtain high enough performance in such applications, efficient algorithms are highly important. Since these algorithms need to be realized in hardware it must also be ensured that they meet high requirements in terms of small chip area, low computation time and low power consumption. Power consumption is particularly important since many applications are battery powered.For most unary functions, directly applying an approximation methodology in a straightforward way will not lead to an efficient implementation. Instead, a dedicated algorithm often has to be developed. The functions roots, inverse and inverse roots are in this category. The developed approaches are founded on working in a floating-point format. For the roots functions also a change of number base is used. These procedures not only enable simpler solutions but also increased accuracy, since the approximation algorithm is performed on a mantissa of limited range.As a summarizing example the inverse square root is chosen. For comparison, the inverse square root is implemented using two methodologies: Harmonized Parabolic Synthesis and Newton-Raphson method. The novel methodology, Harmonized Parabolic Synthesis (HPS), is chosen since it has been demonstrated to provide very efficient approximations. The Newton-Raphson (NR) method is chosen since it is known for providing a very efficient implementation of the inverse square root. It is also commonly used in signal processing applications for computing approximations on fixed-point numbers of a limited range. Four implementations are made; HPS with 32 and 512 interpolation intervals and NR with 1 and 2 iterations. Summarizing the comparisons of the hardware performance, the implementations HPS 32, HPS 512 and NR 1 are comparable when it comes to hardware performance, while NR 2 is much worse. However, HPS 32 stands out in terms of better performance when it comes to the distribution of the error.
Computing Euclidean Distances is a very important operation in digital communication, especially in the case of trellis coded modulation, where it is used numerously. This paper shows that a substantial reduction in complexity can be achieved in hardware processing elements for computing Euclidean Distances. A reduction in complexity down to 39% is shown compared to traditional designs. The paper also shows that the optimized design can be done completely ripple free, which leads to a reduction of the critical path to far more than half. The reduction in complexity leads to a reduction in power consumption. The ripple free design also leads to lower power consumption for two reasons: the rippling in itself leads to unnecessary glitches, which costs power and the shorter critical path enables a lower supply voltage, which reduces the power consumption as well. © 2011 IEEE.
This paper presents hardware implementations of Taylor series. The focus will be on the exponential function but the methodology is applicable on any unary function. Two different architectures are investigated, one, original, straight forward and one modified structure. The outcomes are higher performance, lower area, and lower power consumption for the modified architecture compared to the original.
This paper shows a novel methodology to improve unrolled CORDIC architectures. The methodology is based on removing adder stages starting from the first stage. As an example, a 19-stage CORDIC is used but the methodology is applicable on CORDICs with an arbitrary number of stages. The CORDIC is implemented, simulated, and synthesized into hardware. In the paper, the performance is shown to be increased by 23% and that the dynamic power can be reduced by 27%. © 2014 IEEE
High performance implementations of unary functions are important in many applications e.g. in the wireless communication area. This paper shows the development and VLSI implementation of unary functions like the logarithmic and exponential function, by using anovel approximation methodology based on parabolic synthesis, which is compared to the well known CORDIC algorithm. Both designs are synthesized and implemented on an FPGA and as an ASIC. The results of such implementations are compared with metrics such as performance and area. The performance in the parabolic architecture is shown to exceed the CORDIC architecture by a factor 4.2, in a 65 nm Standard-VT ASIC implementation. © 2011 IEEE.
This paper proposes a novel method for performing division on floating-point numbers represented in IEEE-754 single-precision (binary32) format. The method is based on an inverter, implemented as a combination of Parabolic Synthesis and second-degree interpolation, followed by a multiplier. It is implemented with and without pipeline stages individually and synthesized while targeting a Xilinx Ultrascale FPGA.
The implementations show better resource usage and latency results when compared to other implementations based on different methods. In case of throughput, the proposed method outperforms most of the other works, however, some Altera FPGAs achieve higher clock rate due to the differences in the DSP slice multiplier design.
Due to the small size, low latency and high throughput, the presented floating-point division unit is suitable for high performance embedded systems and can be integrated into accelerators or be used as a stand-alone accelerator.