# 2.2: Encoding

- Page ID
- 13947

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)## 2.2 Encoding

## 2.2 Encoding

The first step in turning nodes and links into usable building blocks is to understand how to connect them in such a way that bits can be transmitted from one node to the other. As mentioned in the preceding section, signals propagate over physical links. The task, therefore, is to encode the binary data that the source node wants to send into the signals that the links are able to carry and then to decode the signal back into the corresponding binary data at the receiving node. We ignore the details of modulation and assume we are working with two discrete signals: high and low. In practice, these signals might correspond to two different voltages on a copper-based link or two different power levels on an optical link.

Most of the functions discussed in this chapter are performed by a
*network adaptor*—a piece of hardware that connects a node to a link.
The network adaptor contains a signalling component that actually
encodes bits into signals at the sending node and decodes signals into
bits at the receiving node. Thus, as illustrated in
Figure 1, signals travel over a link between two
signalling components, and bits flow between network adaptors.

Let's return to the problem of encoding bits onto signals. The obvious
thing to do is to map the data value 1 onto the high signal and the data
value 0 onto the low signal. This is exactly the mapping used by an
encoding scheme called, cryptically enough, *non-return to zero* (NRZ).
For example, Figure 2 schematically depicts the
NRZ-encoded signal (bottom) that corresponds to the transmission of a
particular sequence of bits (top).

The problem with NRZ is that a sequence of several consecutive 1s means
that the signal stays high on the link for an extended period of time;
similarly, several consecutive 0s means that the signal stays low for a
long time. There are two fundamental problems caused by long strings of
1s or 0s. The first is that it leads to a situation known as *baseline
wander*. Specifically, the receiver keeps an average of the signal it
has seen so far and then uses this average to distinguish between low
and high signals. Whenever the signal is significantly lower than this
average, the receiver concludes that it has just seen a 0; likewise, a
signal that is significantly higher than the average is interpreted to
be a 1. The problem, of course, is that too many consecutive 1s or 0s
cause this average to change, making it more difficult to detect a
significant change in the signal.

The second problem is that frequent transitions from high to low and
*vice versa* are necessary to enable *clock recovery*. Intuitively, the
clock recovery problem is that both the encoding and decoding
processes are driven by a clock—every clock cycle the sender transmits
a bit and the receiver recovers a bit. The sender's and the receiver's
clocks have to be precisely synchronized in order for the receiver to
recover the same bits the sender transmits. If the receiver's clock is
even slightly faster or slower than the sender's clock, then it does not
correctly decode the signal. You could imagine sending the clock to the
receiver over a separate wire, but this is typically avoided because it
makes the cost of cabling twice as high. So, instead, the receiver
derives the clock from the received signal—the clock recovery process.
Whenever the signal changes, such as on a transition from 1 to 0 or from
0 to 1, then the receiver knows it is at a clock cycle boundary, and it
can resynchronize itself. However, a long period of time without such a
transition leads to clock drift. Thus, clock recovery depends on having
lots of transitions in the signal, no matter what data is being sent.

One approach that addresses this problem, called *non-return to zero
inverted* (NRZI), has the sender make a transition from the current
signal to encode a 1 and stay at the current signal to encode a 0. This
solves the problem of consecutive 1s, but obviously does nothing for
consecutive 0s. NRZI is illustrated in Figure 3. An
alternative, called *Manchester encoding*, does a more
explicit job of merging the clock with the signal by transmitting the
exclusive OR of the NRZ-encoded data and the clock. (Think of the
local clock as an internal signal that alternates from low to high; a
low/high pair is considered one clock cycle.) The Manchester encoding
is also illustrated in Figure 3. Observe that
the Manchester encoding results in 0 being encoded as a low-to-high
transition and 1 being encoded as a high-to-low transition. Because
both 0s and 1s result in a transition to the signal, the clock can be
effectively recovered at the receiver. (There is also a variant of the
Manchester encoding, called *Differential Manchester*, in which a 1 is
encoded with the first half of the signal equal to the last half of
the previous bit's signal and a 0 is encoded with the first half of
the signal opposite to the last half of the previous bit's signal.)

The problem with the Manchester encoding scheme is that it doubles the
rate at which signal transitions are made on the link, which means that
the receiver has half the time to detect each pulse of the signal. The
rate at which the signal changes is called the link's *baud rate*. In
the case of the Manchester encoding, the bit rate is half the baud rate,
so the encoding is considered only 50% efficient. Keep in mind that if
the receiver had been able to keep up with the faster baud rate required
by the Manchester encoding in Figure 3, then
both NRZ and NRZI could have been able to transmit twice as many bits
in the same time period.

The bit rate isn't necessarily less than or equal to the baud rate, as the Manchester encoding suggests. If the modulation scheme is able to utilize (and recognize) four different signals, as opposed to just two—e.g., "high" and "low"—then it is possible to encode two bits into each clock interval, resulting in a bit rate that is twice the baud rate. Similarly, being able to modulate among eight different signals means being able to transmit three bits per clock interval.

A final encoding that we consider, called *4B/5B*, attempts to address
the inefficiency of the Manchester encoding without suffering from the
problem of having extended durations of high or low signals. The idea of
4B/5B is to insert extra bits into the bit stream so as to break up long
sequences of 0s or 1s. Specifically, every 4 bits of actual data are
encoded in a 5-bit code that is then transmitted to the receiver; hence,
the name 4B/5B. The 5-bit codes are selected in such a way that each one
has no more than one leading 0 and no more than two trailing 0s. Thus,
when sent back-to-back, no pair of 5-bit codes results in more than
three consecutive 0s being transmitted. The resulting 5-bit codes are
then transmitted using the NRZI encoding, which explains why the code is
only concerned about consecutive 0s—NRZI already solves the problem of
consecutive 1s. Note that the 4B/5B encoding results in 80% efficiency.

4-bit Data Symbol | 5-bit Code |
---|---|

0000 | 11110 |

0001 | 01001 |

0010 | 10100 |

0011 | 10101 |

0100 | 01010 |

0101 | 01011 |

0110 | 01110 |

0111 | 01111 |

1000 | 10010 |

1001 | 10011 |

1010 | 10110 |

1011 | 10111 |

1100 | 11010 |

1101 | 11011 |

1110 | 11100 |

1111 | 11101 |

Table 1 gives the 5-bit codes that correspond to
each of the 16 possible 4-bit data symbols. Notice that since 5 bits
are enough to encode 32 different codes, and we are using only 16 of
these for data, there are 16 codes left over that we can use for other
purposes. Of these, code `11111`

is used when the line is idle, code
`00000`

corresponds to when the line is dead, and `00100`

is interpreted
to mean halt. Of the remaining 13 codes, 7 of them are not valid
because they violate the "one leading 0, two trailing 0s," rule, and
the other 6 represent various control symbols. Some of the framing
protocols described later in this chapter make use of these control
symbols.