Skip to main content
Engineering LibreTexts

9.5: Evaluation of the PFA and WFTA

  • Page ID
    2018
  • [ "article:topic" ]

    As for the Cooley-Tukey FFT's, the first evaluation of these algorithms will be on the number of multiplications and additions required. The number of multiplications to compute the PFA in the equation is given by Multidimensional Index Mapping. Using the notation that \(T(N)\) is the number of multiplications or additions necessary to calculate a length-N DFT, the total number for a four-factor PFA of length-N, where \(N=N_1N_2N_3N_4\) is

    \[T(N)=N_1N_2N_3T(N_4)+N_2N_3N_4T(N_1)+N_3N_4N_1T(N_2)+N_4N_1N_2T(N_3)\]

    The count of multiplies and adds in the Table 9.5.1 below are calculated from (105) with the counts of the factors taken from Winograd Fourier Transform Algorithm (WFTA) Table 6.2.1. The list of lengths are those possible with modules in the program of length 2, 3, 4, 5, 7, 8, 9 and 16 as is true for the PFA and the WFTA. A maximum of four relatively prime lengths can be used from this group giving 59 different lengths over the range from 2 to 5040. The radix-2 or split-radix FFT allows 12 different lengths over the same range. If modules of length 11 and 13 from are added, the maximum length becomes 720720 and the number of different lengths becomes 239. Adding modules for 17, 19 and 25 gives a maximum length of 1163962800 and a very large and dense number of possible lengths. The length of the code for the longer modules becomes excessive and should not be included unless needed.

    The number of multiplications necessary for the WFTA is simply the product of those necessary for the required modules, including multiplications by unity. The total number may contain some unity multipliers but it is difficult to remove them in a practical program. Table 9.5.1 contains both the total number (MULTS) and the number with the unity multiplies removed (RMULTS).

    Calculating the number of additions for the WFTA is more complicated than for the PFA because of the expansion of the data moving through the algorithm. For example the number of additions, TA, for the length-15 example in Fig. 9.3.1 is given by

    \[TA(N)=N_2TA(N_1)+TM_1TA(N_2)\]

    where \(N_1=3,\; N_2=5,\; TM_1=\)the number of multiplies for the length-3 module and hence the expansion factor. As mentioned earlier there is an optimum ordering to minimize additions. The ordering used to calculate in Table 9.5.1 is optimal in most cases and close to optimal in the others.

    Table 9.5.1: Number of Real Multiplications and Additions for Complex PFA and WFTA FFTs
    Length PFA PFA WFTA WFTA WFTA
    N Mults Adds Mults RMults Adds
    10 20 88 24 20 88
    12 16 96 24 16 96
    14 32 172 36 32 172
    15 50 162 36 34 162
    18 40 204 44 40 208
    20 40 216 48 40 216
    21 76 300 54 52 300
    24 44 252 48 36 252
    28 64 400 72 64 400
    30 100 384 72 68 384
    35 150 598 108 106 666
    36 80 480 88 80 488
    40 100 532 96 84 532
    42 152 684 108 104 684
    45 190 726 132 130 804
    48 124 636 108 92 660
    56 156 940 144 132 940
    60 200 888 144 136 888
    63 284 1236 198 196 1394
    70 300 1336 216 212 1472
    72 196 1140 176 164 1156
    80 260 1284 216 200 1352
    84 304 1536 216 208 1536
    90 380 1632 264 260 1788
    105 590 2214 324 322 2418
    112 396 2188 324 308 2332
    120 460 2076 288 276 2076
    126 568 2724 396 392 3040
    140 600 2952 432 424 3224
    144 500 2676 396 380 2880
    168 692 3492 432 420 3492
    180 760 3624 528 520 3936
    210 1180 4848 648 644 5256
    240 1100 4812 648 632 5136
    252 1136 5952 792 784 6584
    280 1340 6604 864 852 7148
    315 2050 8322 1188 1186 10336
    336 1636 7908 972 956 8508
    360 1700 8148 1056 1044 8772
    420 2360 10536 1296 1288 11352
    504 2524 13164 1584 1572 14428
    560 3100 14748 1944 1928 17168
    630 4100 17904 2376 2372 21932
    720 3940 18276 2376 2360 21132
    840 5140 23172 2592 2580 24804
    1008 5804 29100 3564 3548 34416
    1260 8200 38328 4752 4744 46384
    1680 11540 50964 5832 5816 59064
    2520 17660 82956 9504 9492 99068
    5040 39100 179772 21384 21368 232668

    From the Table 9.5.1 we see that compared to the PFA or any of the Cooley-Tukey FFT's, the WFTA has significantly fewer multiplications. For the shorter lengths, the WFTA and the PFA have approximately the same number of additions; however for longer lengths, the PFA has fewer and the Cooley-Tukey FFT's always have the fewest. If the total arithmetic, the number of multiplications plus the number of additions, is compared, the split-radix FFT, PFA and WFTA all have about the same count. Special versions of the PFA and WFTA have been developed for real data.

    The size of the Cooley-Tukey program is the smallest, the PFA next and the WFTA largest. The PFA requires the smallest number of stored constants, the Cooley-Tukey or split-radix FFT next, and the WFTA requires the largest number. For a DFT of approximately 1000, the PFA stores 28 constants, the FFT 2048 and the WFTA 3564. Both the FFT and PFA can be calculated in-place and the WFTA cannot. The PFA can be calculated in-order without an unscrambler. The radix-2 FFT can also, but it requires additional indexing overhead. The indexing and data transfer overhead is greatest for the WFTA because the separate preweave and postweave sections each require their indexing and pass through the complete data. The shorter modules in the PFA and WFTA and the butterflies in the radix 2 and 4 FFT's are more efficient than the longer ones because intermediate calculations can be kept in cpu registers rather general memory. However, the shorter modules and radices require more passes through the data for a given approximate length. A proper comparison will require actual programs to be compiled and run on a particular machine. There are many open questions about the relationship of algorithms and hardware architecture.

    Contributor

    • ContribEEBurrus