8.6: The Quick Fourier Transform - An FFT based on Symmetries

Last updated
Save as PDF

Page ID: 2012

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

The development of fast algorithms usually consists of using special properties of the algorithm of interest to remove redundant or unnecessary operations of a direct implementation. The discrete Fourier transform (DFT) defined by

\[C(k)=\sum_{n=0}^{N-1}x(n)W_{N}^{nk} \nonumber \]

where

\[W_{N}=e^{-j2\pi /N} \nonumber \]

has enormous capacity for improvement of its arithmetic efficiency. Most fast algorithms use the periodic and symmetric properties of its basis functions. The classical Cooley-Tukey FFT and prime factor FFT exploit the periodic properties of the cosine and sine functions. Their use of the periodicities to share and, therefore, reduce arithmetic operations depends on the factorability of the length of the data to be transformed. For highly composite lengths, the number of floating-point operation is of order \(N\log (N)\) and for prime lengths it is of order \(N^2\).

This section will look at an approach using the symmetric properties to remove redundancies. This possibility has long been recognized but has not been developed in any systematic way in the open literature. We will develop an algorithm, called the quick Fourier transform (QFT), that will reduce the number of floating point operations necessary to compute the DFT by a factor of two to four over direct methods or Goertzel's method for prime lengths. Indeed, it seems the best general algorithm available for prime length DFTs. One can always do better by using Winograd type algorithms but they must be individually designed for each length. The Chirp Z-transform can be used for longer lengths.

Input and Output Symmetries

We use the fact that the cosine is an even function and the sine is an odd function. The kernel of the DFT or the basis functions of the expansion is given by

\[W_{N}^{nk}=e^{-j2\pi nk/N}=\cos (2\pi nk/N)+j\sin (2\pi nk/N) \nonumber \]

which has an even real part and odd imaginary part. If the data \(x(n)\) are decomposed into their real and imaginary parts and those into their even and odd parts, we have

\[x(n)=u(n)+jv(n)=\left [ u_e(n)+u_o(n) \right ]+j\left [ v_e(n)+v_o(n) \right ] \nonumber \]

where the even part of the real part of \(x(n)\) is given by

\[u_e(n)=(u(n)+u(-n))/2 \nonumber \]

and the odd part of the real part is

\[u_o(n)=(u(n)-u(-n))/2 \nonumber \]

with corresponding definitions of \(v_e(n)\) and \(v_o(n)\). Using Convolution Algorithms with a simpler notation, the DFT of Convolution Algorithms becomes

\[C(k)=\sum_{n=0}^{N-1}(u+jv)(\cos -j\sin ) \nonumber \]

The sum over an integral number of periods of an odd function is zero and the sum of an even function over half of the period is one half the sum over the whole period. This causes the equations to become

\[C(k)=\sum_{n=0}^{N/2-1}\left [ u_e\cos +v_o\sin \right ]+j\left [ v_e\cos -v_o\sin \right ] \nonumber \]

for \(k=0,1,2,...,N-1\)

The evaluation of the DFT using the convolution algorithm equation requires half as many real multiplication and half as many real additions as evaluating it using the other equations. We have exploited the symmetries of the sine and cosine as functions of the time index \(n\). This is independent of whether the length is composite or not. Another view of this formulation is that we have used the property of associatively of multiplication and addition. In other words, rather than multiply two data points by the same value of a sine or cosine then add the results, one should add the data points first then multiply the sum by the sine or cosine which requires one rather than two multiplications.

Next we take advantage of the symmetries of the sine and cosine as functions of the frequency index \(k\). . Using these symmetries on the equation gives

\[C(k)=\sum_{n=0}^{N/2-1}\left [ u_e\cos +v_o\sin \right ]+j\left [ v_e\cos -v_o\sin \right ] \nonumber \]

\[C(N-k)=\sum_{n=0}^{N/2-1}\left [ u_e\cos -v_o\sin \right ]+j\left [ v_e\cos +v_o\sin \right ] \nonumber \]

for \(k=0,1,2,...,N/2-1\). This again reduces the number of operations by a factor of two, this time because it calculates two output values at a time. The first reduction by a factor of two is always available. The second is possible only if both DFT values are needed. It is not available if you are calculating only one DFT value. The above development has not dealt with the details that arise with the difference between an even and an odd length. That is straightforward.

Further Reductions if the Length is Even

If the length of the sequence to be transformed is even, there are further symmetries that can be exploited. There will be four data values that are all multiplied by plus or minus the same sine or cosine value. This means a more complicated pre-addition process which is a generalization of the simple calculation of the even and odd parts in the equations will reduce the size of the order \(N^2\) part of the algorithm by still another factor of two or four. It the length is divisible by 4, the process can be repeated. Indeed, it the length is a power of 2, one can show this process is equivalent to calculating the DFT in terms of discrete cosine and sine transforms with a resulting arithmetic complexity of order \(N\log (N)\) and with a structure that is well suited to real data calculations and pruning.

If the flow-graph of the Cooley-Tukey FFT is compared to the flow-graph of the QFT, one notices both similarities and differences. Both progress in stages as the length is continually divided by two. The Cooley-Tukey algorithm uses the periodic properties of the sine and cosine to give the familiar horizontal tree of butterflies. The parallel diagonal lines in this graph represent the parallel stepping through the data in synchronism with the periodic basis functions. The QFT has diagonal lines that connect the first data point with the last, then the second with the next to last, and so on to give a “star" like picture. This is interesting in that one can look at the flow graph of an algorithm developed by some completely different strategy and often find section with the parallel structures and other parts with the star structure. These must be using some underlying periodic and symmetric properties of the basis functions.

Arithmetic Complexity and Timings

A careful analysis of the QFT shows that \(2N\) additions are necessary to compute the even and odd parts of the input data. This is followed by the length \(N/2\) inner product that requires \(4(N/2)^2=N^2\) real multiplications and an equal number of additions. This is followed by the calculations necessary for the simultaneous calculations of the first half and last half of \(C(k)\) which requires \(4(N/2)=2N\) real additions. This means the total QFT algorithm requires \(M^2\) real multiplications and \(N^2+4N\) real additions. These numbers along with those for the Goertzel algorithm and the direct calculation of the DFT are included in the following table. Of the various order-\(N^2\) DFT algorithms, the QFT seems to be the most efficient general method for an arbitrary length \(N\).

Algorithm	Real Mults.	Real Adds	Trig Eval.

Direct DFT	\(4N^2\)	\(4N^2\)	\(2N^2\)
Mod. 2nd Order Goertzel	\(N^2+N\)	\(2N^2+N\)	\(N\)
QFT	\(N^2\)	\(N^2+4N\)	\(2N\)

Timings of the algorithms on a PC in milliseconds are given in the following table.

Algorithm	\(N=125\)	\(N=256\)

Direct DFT	4.90	19.83
Mod. 2O. Goertzel	1.32	5.55
QFT	1.09	4.50
Chirp + FFT	1.70	3.52

These timings track the floating point operation counts fairly well.

Conclusions

The QFT is a straight-forward DFT algorithm that uses all of the possible symmetries of the DFT basis function with no requirements on the length being composite. These ideas have been proposed before, but have not been published or clearly developed. It seems that the basic QFT is practical and useful as a general algorithm for lengths up to a hundred or so. Above that, the chirp z-transform or other filter based methods will be superior. For special cases and shorter lengths, methods based on Winograd's theories will always be superior. Nevertheless, the QFT has a definite place in the array of DFT algorithms and is not well known. A Fortran program is included in the appendix.

It is possible, but unlikely, that further arithmetic reduction could be achieved using the fact that \(W_N\) has unity magnitude as was done in second-order Goertzel algorithm. It is also possible that some way of combining the Goertzel and QFT algorithm would have some advantages. A development of a complete QFT decomposition of a DFT of length-\(2^M\) shows interesting structure and arithmetic complexity comparable to average Cooley-Tukey FFTs. It does seem better suited to real data calculations with pruning.

Contributor

ContribEEBurrus