9.5: Evaluation of the PFA and WFTA

As for the Cooley-Tukey FFT's, the first evaluation of these algorithms will be on the number of multiplications and additions required. The number of multiplications to compute the PFA in the equation is given by Multidimensional Index Mapping. Using the notation that $$T(N)$$ is the number of multiplications or additions necessary to calculate a length-N DFT, the total number for a four-factor PFA of length-N, where $$N=N_1N_2N_3N_4$$ is

$T(N)=N_1N_2N_3T(N_4)+N_2N_3N_4T(N_1)+N_3N_4N_1T(N_2)+N_4N_1N_2T(N_3)$

The count of multiplies and adds in the Table 9.5.1 below are calculated from (105) with the counts of the factors taken from Winograd Fourier Transform Algorithm (WFTA) Table 6.2.1. The list of lengths are those possible with modules in the program of length 2, 3, 4, 5, 7, 8, 9 and 16 as is true for the PFA and the WFTA. A maximum of four relatively prime lengths can be used from this group giving 59 different lengths over the range from 2 to 5040. The radix-2 or split-radix FFT allows 12 different lengths over the same range. If modules of length 11 and 13 from are added, the maximum length becomes 720720 and the number of different lengths becomes 239. Adding modules for 17, 19 and 25 gives a maximum length of 1163962800 and a very large and dense number of possible lengths. The length of the code for the longer modules becomes excessive and should not be included unless needed.

The number of multiplications necessary for the WFTA is simply the product of those necessary for the required modules, including multiplications by unity. The total number may contain some unity multipliers but it is difficult to remove them in a practical program. Table 9.5.1 contains both the total number (MULTS) and the number with the unity multiplies removed (RMULTS).

Calculating the number of additions for the WFTA is more complicated than for the PFA because of the expansion of the data moving through the algorithm. For example the number of additions, TA, for the length-15 example in Fig. 9.3.1 is given by

$TA(N)=N_2TA(N_1)+TM_1TA(N_2)$

where $$N_1=3,\; N_2=5,\; TM_1=$$the number of multiplies for the length-3 module and hence the expansion factor. As mentioned earlier there is an optimum ordering to minimize additions. The ordering used to calculate in Table 9.5.1 is optimal in most cases and close to optimal in the others.

Table 9.5.1: Number of Real Multiplications and Additions for Complex PFA and WFTA FFTs
Length PFA PFA WFTA WFTA WFTA
10 20 88 24 20 88
12 16 96 24 16 96
14 32 172 36 32 172
15 50 162 36 34 162
18 40 204 44 40 208
20 40 216 48 40 216
21 76 300 54 52 300
24 44 252 48 36 252
28 64 400 72 64 400
30 100 384 72 68 384
35 150 598 108 106 666
36 80 480 88 80 488
40 100 532 96 84 532
42 152 684 108 104 684
45 190 726 132 130 804
48 124 636 108 92 660
56 156 940 144 132 940
60 200 888 144 136 888
63 284 1236 198 196 1394
70 300 1336 216 212 1472
72 196 1140 176 164 1156
80 260 1284 216 200 1352
84 304 1536 216 208 1536
90 380 1632 264 260 1788
105 590 2214 324 322 2418
112 396 2188 324 308 2332
120 460 2076 288 276 2076
126 568 2724 396 392 3040
140 600 2952 432 424 3224
144 500 2676 396 380 2880
168 692 3492 432 420 3492
180 760 3624 528 520 3936
210 1180 4848 648 644 5256
240 1100 4812 648 632 5136
252 1136 5952 792 784 6584
280 1340 6604 864 852 7148
315 2050 8322 1188 1186 10336
336 1636 7908 972 956 8508
360 1700 8148 1056 1044 8772
420 2360 10536 1296 1288 11352
504 2524 13164 1584 1572 14428
560 3100 14748 1944 1928 17168
630 4100 17904 2376 2372 21932
720 3940 18276 2376 2360 21132
840 5140 23172 2592 2580 24804
1008 5804 29100 3564 3548 34416
1260 8200 38328 4752 4744 46384
1680 11540 50964 5832 5816 59064
2520 17660 82956 9504 9492 99068
5040 39100 179772 21384 21368 232668

From the Table 9.5.1 we see that compared to the PFA or any of the Cooley-Tukey FFT's, the WFTA has significantly fewer multiplications. For the shorter lengths, the WFTA and the PFA have approximately the same number of additions; however for longer lengths, the PFA has fewer and the Cooley-Tukey FFT's always have the fewest. If the total arithmetic, the number of multiplications plus the number of additions, is compared, the split-radix FFT, PFA and WFTA all have about the same count. Special versions of the PFA and WFTA have been developed for real data.

The size of the Cooley-Tukey program is the smallest, the PFA next and the WFTA largest. The PFA requires the smallest number of stored constants, the Cooley-Tukey or split-radix FFT next, and the WFTA requires the largest number. For a DFT of approximately 1000, the PFA stores 28 constants, the FFT 2048 and the WFTA 3564. Both the FFT and PFA can be calculated in-place and the WFTA cannot. The PFA can be calculated in-order without an unscrambler. The radix-2 FFT can also, but it requires additional indexing overhead. The indexing and data transfer overhead is greatest for the WFTA because the separate preweave and postweave sections each require their indexing and pass through the complete data. The shorter modules in the PFA and WFTA and the butterflies in the radix 2 and 4 FFT's are more efficient than the longer ones because intermediate calculations can be kept in cpu registers rather general memory. However, the shorter modules and radices require more passes through the data for a given approximate length. A proper comparison will require actual programs to be compiled and run on a particular machine. There are many open questions about the relationship of algorithms and hardware architecture.

Contributor

• ContribEEBurrus