Fast Convolution by Overlap-Add
In order to use the FFT to convolve (or filter) a long input sequence \(x(n)\) with a finite length-M impulse response, \(h(n)\), we partition the input sequence in segments or blocks of length \(L\). Because convolution (or filtering) is linear, the output is a linear sum of the result of convolving the first block with \(h(n)\) plus the result of convolving the second block with \(h(n)\) plus the rest. Each of these block convolutions can be calculated by using the FFT. The output is the inverse FFT of the product of the FFT of \(x(n)\) and the FFT of \(h(n)\). Since the number of arithmetic operation to calculate the convolution directly is on the order of \(M^2\) and, if done with the FFT, is on the order of \(M\log (M)\), there can be a great savings by using the FFT for large \(M\).
The reason this procedure is not totally straightforward, is the length of the output of convolving a length-L block with a length-M filter is of length \(L+M-1\). This means the output blocks cannot simply be concatenated but must be overlapped and added, hence the name for this algorithm is “Overlap-Add".
The second issue that must be taken into account is the fact that the overlap-add steps need non-cyclic convolution and convolution by the FFT is cyclic. This is easily handled by appending \(L-1\) zeros to the impulse response and \(M-1\) zeros to each input block so that all FFTs are of length \(M+L-1\). This means there is no aliasing and the implemented cyclic convolution gives the same output as the desired non-cyclic convolution.
The savings in arithmetic can be considerable when implementing convolution or performing FIR digital filtering. However, there are two penalties. The use of blocks introduces a delay of one block length. None of the first block of output can be calculated until all of the first block of input is available. This is not a problem for “off line" or “batch" processing but can be serious for real-time processing. The second penalty is the memory required to store and process the blocks. The continuing reduction of memory cost often removes this problem.
The efficiency in terms of number of arithmetic operations per output point increases for large blocks because of the \(M\log (M)\) requirements of the FFT. However, the blocks become very large (\(L> > M\)), much of the input block will be the appended zeros and efficiency is lost. For any particular application, taking the particular filter and FFT algorithm being used and the particular hardware being used, a plot of efficiency vs. block length, \(L\) should be made and \(L\) chosen to maximize efficiency given any other constraints that are applicable.
Usually, the block convolutions are done by the FFT, but they could be done by any efficient, finite length method. One could use “rectangular transforms" or “number-theoretic transforms". A generalization of this method is presented later in the notes.
Fast Convolution by Overlap-Save
An alternative approach to the Overlap-Add can be developed by starting with segmenting the output rather than the input. If one considers the calculation of a block of output, it is seen that not only the corresponding input block is needed, but part of the preceding input block also needed. Indeed, one can show that a length \(M+L-1\) segment of the input is needed for each output block. So, one saves the last part of the preceding block and concatenates it with the current input block, then convolves that with \(h(n)\) to calculate the current output.