# 9.5: Evaluation of the PFA and WFTA

As for the Cooley-Tukey FFT's, the first evaluation of these algorithms will be on the number of multiplications and additions required. The number of multiplications to compute the PFA in the equation is given by Multidimensional Index Mapping. Using the notation that \(T(N)\) is the number of multiplications or additions necessary to calculate a length-N DFT, the total number for a four-factor PFA of length-N, where \(N=N_1N_2N_3N_4\) is

\[T(N)=N_1N_2N_3T(N_4)+N_2N_3N_4T(N_1)+N_3N_4N_1T(N_2)+N_4N_1N_2T(N_3)\]

The count of multiplies and adds in the Table 9.5.1 below are calculated from (105) with the counts of the factors taken from Winograd Fourier Transform Algorithm (WFTA) Table 6.2.1. The list of lengths are those possible with modules in the program of length 2, 3, 4, 5, 7, 8, 9 and 16 as is true for the PFA and the WFTA. A maximum of four relatively prime lengths can be used from this group giving 59 different lengths over the range from 2 to 5040. The radix-2 or split-radix FFT allows 12 different lengths over the same range. If modules of length 11 and 13 from are added, the maximum length becomes 720720 and the number of different lengths becomes 239. Adding modules for 17, 19 and 25 gives a maximum length of 1163962800 and a very large and dense number of possible lengths. The length of the code for the longer modules becomes excessive and should not be included unless needed.

The number of multiplications necessary for the WFTA is simply the product of those necessary for the required modules, including multiplications by unity. The total number may contain some unity multipliers but it is difficult to remove them in a practical program. Table 9.5.1 contains both the total number (MULTS) and the number with the unity multiplies removed (RMULTS).

Calculating the number of additions for the WFTA is more complicated than for the PFA because of the expansion of the data moving through the algorithm. For example the number of additions, TA, for the length-15 example in Fig. 9.3.1 is given by

\[TA(N)=N_2TA(N_1)+TM_1TA(N_2)\]

where \(N_1=3,\; N_2=5,\; TM_1=\)the number of multiplies for the length-3 module and hence the expansion factor. As mentioned earlier there is an optimum ordering to minimize additions. The ordering used to calculate in Table 9.5.1 is optimal in most cases and close to optimal in the others.

Length |
PFA |
PFA |
WFTA |
WFTA |
WFTA |
---|---|---|---|---|---|

N | Mults | Adds | Mults | RMults | Adds |

10 | 20 | 88 | 24 | 20 | 88 |

12 | 16 | 96 | 24 | 16 | 96 |

14 | 32 | 172 | 36 | 32 | 172 |

15 | 50 | 162 | 36 | 34 | 162 |

18 | 40 | 204 | 44 | 40 | 208 |

20 | 40 | 216 | 48 | 40 | 216 |

21 | 76 | 300 | 54 | 52 | 300 |

24 | 44 | 252 | 48 | 36 | 252 |

28 | 64 | 400 | 72 | 64 | 400 |

30 | 100 | 384 | 72 | 68 | 384 |

35 | 150 | 598 | 108 | 106 | 666 |

36 | 80 | 480 | 88 | 80 | 488 |

40 | 100 | 532 | 96 | 84 | 532 |

42 | 152 | 684 | 108 | 104 | 684 |

45 | 190 | 726 | 132 | 130 | 804 |

48 | 124 | 636 | 108 | 92 | 660 |

56 | 156 | 940 | 144 | 132 | 940 |

60 | 200 | 888 | 144 | 136 | 888 |

63 | 284 | 1236 | 198 | 196 | 1394 |

70 | 300 | 1336 | 216 | 212 | 1472 |

72 | 196 | 1140 | 176 | 164 | 1156 |

80 | 260 | 1284 | 216 | 200 | 1352 |

84 | 304 | 1536 | 216 | 208 | 1536 |

90 | 380 | 1632 | 264 | 260 | 1788 |

105 | 590 | 2214 | 324 | 322 | 2418 |

112 | 396 | 2188 | 324 | 308 | 2332 |

120 | 460 | 2076 | 288 | 276 | 2076 |

126 | 568 | 2724 | 396 | 392 | 3040 |

140 | 600 | 2952 | 432 | 424 | 3224 |

144 | 500 | 2676 | 396 | 380 | 2880 |

168 | 692 | 3492 | 432 | 420 | 3492 |

180 | 760 | 3624 | 528 | 520 | 3936 |

210 | 1180 | 4848 | 648 | 644 | 5256 |

240 | 1100 | 4812 | 648 | 632 | 5136 |

252 | 1136 | 5952 | 792 | 784 | 6584 |

280 | 1340 | 6604 | 864 | 852 | 7148 |

315 | 2050 | 8322 | 1188 | 1186 | 10336 |

336 | 1636 | 7908 | 972 | 956 | 8508 |

360 | 1700 | 8148 | 1056 | 1044 | 8772 |

420 | 2360 | 10536 | 1296 | 1288 | 11352 |

504 | 2524 | 13164 | 1584 | 1572 | 14428 |

560 | 3100 | 14748 | 1944 | 1928 | 17168 |

630 | 4100 | 17904 | 2376 | 2372 | 21932 |

720 | 3940 | 18276 | 2376 | 2360 | 21132 |

840 | 5140 | 23172 | 2592 | 2580 | 24804 |

1008 | 5804 | 29100 | 3564 | 3548 | 34416 |

1260 | 8200 | 38328 | 4752 | 4744 | 46384 |

1680 | 11540 | 50964 | 5832 | 5816 | 59064 |

2520 | 17660 | 82956 | 9504 | 9492 | 99068 |

5040 | 39100 | 179772 | 21384 | 21368 | 232668 |

From the Table 9.5.1 we see that compared to the PFA or any of the Cooley-Tukey FFT's, the WFTA has significantly fewer multiplications. For the shorter lengths, the WFTA and the PFA have approximately the same number of additions; however for longer lengths, the PFA has fewer and the Cooley-Tukey FFT's always have the fewest. If the total arithmetic, the number of multiplications plus the number of additions, is compared, the split-radix FFT, PFA and WFTA all have about the same count. Special versions of the PFA and WFTA have been developed for real data.

The size of the Cooley-Tukey program is the smallest, the PFA next and the WFTA largest. The PFA requires the smallest number of stored constants, the Cooley-Tukey or split-radix FFT next, and the WFTA requires the largest number. For a DFT of approximately 1000, the PFA stores 28 constants, the FFT 2048 and the WFTA 3564. Both the FFT and PFA can be calculated in-place and the WFTA cannot. The PFA can be calculated in-order without an unscrambler. The radix-2 FFT can also, but it requires additional indexing overhead. The indexing and data transfer overhead is greatest for the WFTA because the separate preweave and postweave sections each require their indexing and pass through the complete data. The shorter modules in the PFA and WFTA and the butterflies in the radix 2 and 4 FFT's are more efficient than the longer ones because intermediate calculations can be kept in cpu registers rather general memory. However, the shorter modules and radices require more passes through the data for a given approximate length. A proper comparison will require actual programs to be compiled and run on a particular machine. There are many open questions about the relationship of algorithms and hardware architecture.

### Contributor

- ContribEEBurrus