# 12.3: Block Processing - a Generalization of Overlap Methods

- Page ID
- 2037

Convolution is intimately related to the DFT. It was shown in The DFT as Convolution or Filtering that a prime length DFT could be converted to cyclic convolution. It has been long known that convolution can be calculated by multiplying the DFTs of signals.

An important question is what is the fastest method for calculating digital convolution. There are several methods that each have some advantage. The earliest method for fast convolution was the use of sectioning with overlap-add or overlap-save and the FFT. In most cases the convolution is of real data and, therefore, real-data FFTs should be used. That approach is still probably the fastest method for longer convolution on a general purpose computer or microprocessor. The shorter convolutions should simply be calculated directly.

### Introduction

The partitioning of long or infinite strings of data into shorter sections or blocks has been used to allow application of the FFT to realize on-going or continuous convolution. This section develops the idea of block processing and shows that it is a generalization of the overlap-add and overlap-save methods. They further generalize the idea to a multidimensional formulation of convolution. Moving in the opposite direction, it is shown that, rather than partitioning a string of scalars into blocks and then into blocks of blocks, one can partition a scalar number into blocks of bits and then include the operation of multiplication in the signal processing formulation. This is called distributed arithmetic and since it describes operations at the bit level, is completely general. These notes try to present a coherent development of these ideas.

### Block Signal Processing

In this section the usual convolution and recursion that implements FIR and IIR discrete-time filters are reformulated in terms of vectors and matrices. Because the same data is partitioned and grouped in a variety of ways, it is important to have a consistent notation in order to be clear. The \(n^{th}\) element of a data sequence is expressed \(h(n)\) or, in some cases to simplify \(h_n\). A block or finite length column vector is denoted \(\underline{h}_n\) with \(n\) indicating the \(n^{th}\) block or section of a longer vector. A matrix, square or rectangular, is indicated by an upper case letter such as \(H\) with a subscript if appropriate.

#### Block Convolution

The operation of a finite impulse response (FIR) filter is described by a finite convolution as

\[y(n)=\sum_{k=0}^{L-1}h(k)x(n-k)\]

where \(x(n)\) is causal, \(h(n)\) is causal and of length \(L\), and the time index \(n\), goes from zero to infinity or some large value. With a change of index variables this becomes

\[y(n)=\sum h(n-k)x(k)\]

which can be expressed as a matrix operation by

\[\begin{bmatrix} y_0\\ y_1\\ y_2\\ \vdots \end{bmatrix}=\begin{bmatrix} h_0 & 0 & 0 & \cdots & 0\\ h_1 & h_0 & 0 & & \\ h_2 & h_1 & h_0 & & \\ \vdots & & & & \vdots \end{bmatrix}\begin{bmatrix} x_0\\ x_1\\ x_2\\ \vdots \end{bmatrix}\]

The \(H\) matrix of impulse response values is partitioned into \(N\) by \(N\) square sub matrices and the \(X\) and \(Y\) vectors are partitioned into length-\(N\) blocks or sections. This is illustrated for \(N=3\) by

\[H_0=\begin{bmatrix} h_0 & 0 & 0\\ h_1 & h_0 & 0\\ h_2 & h_1 & h_0 \end{bmatrix}\; \; \; \; \; H_1=\begin{bmatrix} h_3 & h_2 & h_1\\ h_4 & h_3 & h_2\\ h_5 & h_4 & h_3 \end{bmatrix}\; \; \; \; \; \text{etc.}\]

\[\underline{x}_0=\begin{bmatrix} x_0\\ x_1\\ x_2 \end{bmatrix}\; \; \; \; \; \underline{x}_1=\begin{bmatrix} x_3\\ x_4\\ x_5 \end{bmatrix}\; \; \; \; \; \underline{y}_0=\begin{bmatrix} y_0\\ y_1\\ y_2 \end{bmatrix}\; \; \; \; \; \text{etc.}\]

Substituting these definitions into the equation gives

\[\begin{bmatrix} y_0\\ y_1\\ y_2\\ \vdots \end{bmatrix}=\begin{bmatrix} H_0 & 0 & 0 & \cdots & 0\\ H_1 & H_0 & 0 & & \\ H_2 & H_1 & H_0 & & \\ \vdots & & & & \vdots \end{bmatrix}\begin{bmatrix} x_0\\ x_1\\ x_2\\ \vdots \end{bmatrix}\]

The general expression for the \(n^{th}\) output block is

\[\underline{y}_n=\sum_{k=0}^{n}H_{n-k}\underline{x}_k\]

which is a vector or block convolution. Since the matrix-vector multiplication within the block convolution is itself a convolution, the equation is a sort of convolution of convolutions and the finite length matrix-vector multiplication can be carried out using the FFT or other fast convolution methods.

The equation for one output block can be written as the product

\[\underline{y}_2=\begin{bmatrix} H_2 & H_1 & H_0 \end{bmatrix}\begin{bmatrix} \underline{x}_0\\ \underline{x}_1\\ \underline{x}_2 \end{bmatrix}\]

and the effects of one input block can be written

\[\begin{bmatrix} H_0\\ H_1\\ H_2 \end{bmatrix}\underline{x}_1=\begin{bmatrix} \underline{y}_0\\ \underline{y}_1\\ \underline{y}_2 \end{bmatrix}\]

These are generalize statements of overlap save and overlap ad. The block length can be longer, shorter, or equal to the filter length.

#### Block Recursion

Although less well-known, IIR filters can also be implemented with block processing. The block form of an IIR filter is developed in much the same way as for the block convolution implementation of the FIR filter. The general constant coefficient difference equation which describes an IIR filter with recursive coefficients *al* , convolution coefficients *bk* , input signal *x*(*n*) , and output signal *y*(*n*) is given by

\[y(n)=\sum_{l=1}^{N-1}a_ly_{n-l}+\sum_{k=0}^{M-1}b_kx_{n-k}\]

using both functional notation and subscripts, depending on which is easier and clearer. The impulse response \(h(n)\) is

\[y(n)=\sum_{l=1}^{N-1}a_lh(n-l)+\sum_{k=0}^{M-1}b_k\delta (n-k)\]

which can be written in matrix operator form

\[\begin{bmatrix} 1 & 0 & 0 & \cdots & 0\\ a_1 & 1 & 0 & & \\ a_2 & a_1 & 1 & & \\ a_3 & a_2 & a_1 & & \\ 0 & a_3 & a_2 & & \\ \vdots & & & & \vdots \end{bmatrix}\begin{bmatrix} h_0\\ h_1\\ h_2\\ h_3\\ h_4\\ \vdots \end{bmatrix}=\begin{bmatrix} b_0\\ b_1\\ b_2\\ b_3\\ 0\\ \vdots \end{bmatrix}\]

In terms of *N* by *N* submatrices and length-*N* blocks, this becomes

\[\begin{bmatrix} A_0 & 0 & 0 & \cdots & 0\\ A_1 & A_0 & 0 & & \\ 0 & A_1 & A_0 & & \\ \vdots & & & & \vdots \end{bmatrix}\begin{bmatrix} \underline{y}_0\\ \underline{y}_1\\ \underline{y}_2\\ \vdots \end{bmatrix}=\begin{bmatrix} B_0 & 0 & 0 & \cdots & 0\\ B_1 & B_0 & 0 & & \\ 0 & B_1 & B_0 & & \\ \vdots & & & & \vdots \end{bmatrix}\begin{bmatrix} \underline{x}_0\\ \underline{x}_1\\ \underline{x}_2\\ \vdots \end{bmatrix}\]

From the partitioned rows of the equation, one can write the block recursive relation

\[A_0\underline{y}_{n+1}+A_1\underline{y}_{n}=B_0\underline{x}_{n+1}+B_1\underline{x}_{n}\]

Solving for \(\underline{y}_{n+1}\) gives

\[\underline{y}_{n+1}=-A_{0}^{-1}A_1\underline{y}_n+A_{0}^{-1}B_0\underline{x}_{n+1}+A_{0}^{-1}B_1\underline{x}_n\]

\[\underline{y}_{n+1}=K\underline{y}_n+H_0\underline{x}_{n+1}+\tilde{H}_1\underline{x}_n\]

which is a first order vector difference equation. This is the fundamental block recursive algorithm that implements the original scalar difference equation in the equation. It has several important characteristics.

- The block recursive formulation is similar to a state variable equation but the states are blocks or sections of the output.
- The eigenvalues of
*K*are the poles of the original scalar problem raised to the*N*power plus others that are zero. The longer the block length, the “more stable" the filter is, i.e. the further the poles are from the unit circle. - If the block length were shorter than the denominator, the vector difference equation would be higher than first order. There would be a non zero
*A*2 . If the block length were shorter than the numerator, there would be a non zero*B*2 and a higher order block convolution operation. If the block length were one, the order of the vector equation would be the same as the scalar equation. They would be the same equation. - The actual arithmetic that goes into the calculation of the output is partly recursive and partly convolution. The longer the block, the more the output is calculated by convolution and, the more arithmetic is required.
- It is possible to remove the zero eigenvalues in
*K*by making*K*rectangular or square and N by N This results in a form even more similar to a state variable formulation. This is briefly discussed below. - There are several ways of using the FFT in the calculation of the various matrix products in the equations. Each has some arithmetic advantage for various forms and orders of the original equation. It is also possible to implement some of the operations using rectangular transforms, number theoretic transforms, distributed arithmetic, or other efficient convolution algorithms.
- By choosing the block length equal to the period, a periodically time varying filter can be made block time invariant. In other words, all the time varying characteristics are moved to the finite matrix multiplies which leave the time invariant properties at the block level. This allows use of z-transform and other time-invariant methods to be used for stability analysis and frequency response analysis. It also turns out to be related to filter banks and multi-rate filters.

#### Block State Formulation

It is possible to reduce the size of the matrix operators in the block recursive description equation to give a form even more like a state variable equation. If *K* in the equation has several zero eigenvalues, it should be possible to reduce the size of *K* until it has full rank. The result is

\[\underline{z}_n=K_1\underline{z}_{n-1}+K_2\underline{x}_n\]

\[\underline{y}_n=H_1\underline{z}_{n-1}+H_0\underline{x}_n\]

where *H*0 is the same *N* by *N* convolution matrix, *N*1 is a rectangular *L* by *N* partition of the convolution matrix *H* , *K*1 is a square *N* by *N* matrix of full rank, and *K*2 is a rectangular *N* by *L* matrix.

This is now a minimal state equation whose input and output are blocks of the original input and output. Some of the matrix multiplications can be carried out using the FFT or other techniques.

#### Block Implementations of Digital Filters

The advantage of the block convolution and recursion implementations is a possible improvement in arithmetic efficiency by using the FFT or other fast convolution methods for some of the multiplications in the equations. There is the reduction of quantization effects due to an effective decrease in the magnitude of the eigenvalues and the possibility of easier parallel implementation for IIR filters. The disadvantages are a delay of at least one block length and an increased memory requirement.

These methods could also be used in the various filtering methods for evaluating the DFT. This the chirp z-transform, Rader's method, and Goertzel's algorithm.

#### Multidimensional Formulation

This process of partitioning the data vectors and the operator matrices can be continued by partitioning the equations and creating blocks of blocks to give a higher dimensional structure. One should use index mapping ideas rather than partitioned matrices for this approach.

#### Periodically Time-Varying Discrete-Time Systems

Most time-varying systems are periodically time-varying and this allows special results to be obtained. If the block length is set equal to the period of the time variations, the resulting block equations are time invariant and all to the time varying characteristics are contained in the matrix multiplications. This allows some of the tools of time invariant systems to be used on periodically time-varying systems.

#### Multirate Filters, Filter Banks, and Wavelets

Another area that is related to periodically time varying systems and to block processing is filter banks. Recently the area of perfect reconstruction filter banks has been further developed and shown to be closely related to wavelet based signal analysis. The filter bank structure has several forms with the polyphase and lattice being particularly interesting.

Parks has noted that design of multirate filters has some elements in common with complex approximation and of 2-D filter design and is looking at using Tang's method for these designs.

#### Distributed Arithmetic

Rather than grouping the individual scalar data values in a discrete-time signal into blocks, the scalar values can be partitioned into groups of bits. Because multiplication of integers, multiplication of polynomials, and discrete-time convolution are the same operations, the bit-level description of multiplication can be mixed with the convolution of the signal processing. The resulting structure is called distributed arithmetic. It can be used to create an efficient table look-up scheme to implement an FIR or IIR filter using no multiplications by fetching previously calculated partial products which are stored in a table. Distributed arithmetic, block processing, and multi-dimensional formulations can be combined into an integrated powerful description to implement digital filters and processors.

### Contributor

- ContribEEBurrus