The Winograd Fourier transform algorithm (WFTA) uses a very powerful property of the Type-1 index map and the DFT to give a further reduction of the number of multiplications in the PFA. Using an operator notation where \(F_1\) represents taking row DFT's and \(F_2\) represents column DFT's, the two-factor PFA of the equation is represented by

\[X=F_2F_1x\]

It has been shown that if each operator represents identical operations on each row or column, they commute. Since \(F_1\) and \(F_2\) represent length \(N_1\) and \(N_2\) DFT's, they commute and the equation can also be written

N1N1" role="presentation" style="position:relative;" tabindex="0">F1F1" role="presentation" style="position:relative;" tabindex="0">\[X=F_1F_2x\]

If each short DFT in \(F\)FF" role="presentation" style="position:relative;" tabindex="0"> is expressed by three operators as in Winograd's Short DFT Algorithms \(F\)FF" role="presentation" style="position:relative;" tabindex="0"> can be factored as

FF" role="presentation" style="position:relative;" tabindex="0">\[F=A^TDA\]

where \(A\)FF" role="presentation" style="position:relative;" tabindex="0"> represents the set of additions done on each row or column that performs the residue reduction as Winograd's Short DFT Algorithms. Because of the appearance of the flow graph of \(A\)FF" role="presentation" style="position:relative;" tabindex="0">AA" role="presentation" style="position:relative;" tabindex="0"> and because it is the first operator on \(x\), it is called a preweave operator. \(D\) is the set of \(M\) multiplications and ATAT" role="presentation" style="position:relative;" tabindex="0">DD" role="presentation" style="position:relative;" tabindex="0">\(A^T\) or ATAT" role="presentation" style="position:relative;" tabindex="0">DD" role="presentation" style="position:relative;" tabindex="0">\(B^T\) or ATAT" role="presentation" style="position:relative;" tabindex="0">DD" role="presentation" style="position:relative;" tabindex="0">\(C^T\) from Winograd's Short DFT Algorithms it is the reconstruction operator called the postweave. Applying Equation to Equation gives

\[X=A_2^TD_2A_2A_1^TD_1A_1x\]

This is the PFA of the equation and Fig. 9.2.1 where \(A_1D_1A_1\) represents the row DFT's on the array formed from \(x\). Because these operators commute, the equation can also be written as

\[X=A_2^TA_1^TD_2D_1A_2A_1x\]

or

\[X=A_1^TA_2^TD_2D_1A_2A_1x\]

but the two adjacent multiplication operators can be premultiplied and the result represented by one operator \(D=D_2D_1\) which is no longer the same for each row or column. The equation then becomes

\[X=A_1^TA_2^TDA_2A_1x\]

This is the basic idea of the Winograd Fourier transform algorithm. The commuting of the multiplication operators together in the center of the algorithm is called nesting and it results in a significant decrease in the number of multiplications that must be done at the execution of the algorithm. Pictorially, the PFA of Fig. 9.2.1 becomes the WFTA in the Fig. 9.3.1 below.

*Fig. 9.3.1 A Length-15 WFTA with Nested Multiplications*

The rectangular structure of the preweave addition operators causes an expansion of the data in the center of the algorithm. The 15 data points in Fig. 9.3.1 become 18 intermediate values. This expansion is a major problem in programming the WFTA because it prevents a straightforward in-place calculation and causes an increase in the number of required additions and in the number of multiplier constants that must be precalculated and stored.

From Fig. 9.3.1 and the idea of premultiplying the individual multiplication operators, it can be seen why the multiplications by unity had to be considered in Winograd Fourier Transform Algorithm (WFTA) Table 6.2.1. Even if a multiplier in D1D1" role="presentation" style="position:relative;" tabindex="0">\(D_1\) is unity, it may not be in D1D1" role="presentation" style="position:relative;" tabindex="0">\(D_2D_1\). In Fig. 9.3.1, with factors of three and five, there appear to be 18 multiplications required because of the expansion of the length-5 preweave operator, D1D1" role="presentation" style="position:relative;" tabindex="0">\(A_2\), however, one of multipliers in each of the length three and five operators is unity, so one of the 18 multipliers in the product is unity. This gives 17 required multiplications - a rather impressive reduction from the 152=225152=225" role="presentation" style="position:relative;" tabindex="0">\(15^2=225\) multiplications required by direct calculation. This number of 17 complex multiplications will require only 34 real multiplications because, as mentioned earlier, the multiplier constants are purely real or imaginary while the 225 complex multiplications are general and therefore will require four times as many real multiplications.

The number of additions depends on the order of the pre- and postweave operators. For example in the length-15 WFTA in Fig. 9.3.1, if the length-5 had been done first and last, there would have been six row addition preweaves in the preweave operator rather than the five shown. It is difficult to illustrate the algorithm for three or more factors of N, but the ideas apply to any number of factors. Each length has an optimal ordering of the pre- and postweave operators that will minimize the number of additions.

A program for the WFTA is not as simple as for the FFT or PFA because of the very characteristic that reduces the number of multiplications, the nesting. The same lengths are possible with the PFA and WFTA and the same short DFT modules can be used, however, the multiplies in the modules must occur in one place for use in the WFTA.