# 3.1: A bit of information theory

- Page ID
- 40718

A **bit** is a binary digit; it is also a unit of information. If you have one bit, you can specify one of two possibilities, usually written 0 and 1. If you have two bits, there are 4 possible combinations, 00, 01, 10, and 11. In general, if you have \( b \) bits, you can indicate one of \( 2^{b} \) values. A **byte** is 8 bits, so it can hold one of 256 values.

Going in the other direction, suppose you want to store a letter of the alphabet. There are 26 letters, so how many bits do you need? With 4 bits, you can specify one of 16 values, so that’s not enough. With 5 bits, you can specify up to 32 values, so that’s enough for all the letters, with a few values left over.

In general, if you want to specify one of \( N \) values, you should choose the smallest value of \( b \) so that \( 2^{b} \leq N \). Taking the \( \log \) base 2 of both sides yields \( b \leq \log_{2}{N} \).

Suppose I flip a coin and tell you the outcome. I have given you one bit of information. If I roll a six-sided die and tell you the outcome, I have given you \( \log_{2}{6} \) bits of information. And in general, if the probability of the outcome is 1 in \( N \), then the outcome contains \( \log_{2}{N} \) bits of information.

Equivalently, if the probability of the outcome is \( p \), then the information content is \( -\log_{2}{p} \). This quantity is called the **self-information** of the outcome. It measures how surprising the outcome is, which is why it is also called **surprisal**. If your horse has only one chance in 16 of winning, and he wins, you get 4 bits of information (along with the payout). But if the favorite wins 75% of the time, the news of the win contains only 0.42 bits.

Intuitively, unexpected news carries a lot of information; conversely, if there is something you were already confident of, confirming it contributes only a small amount of information.

For several topics in this book, we will need to be comfortable converting back and forth between the number of bits, \( b \), and the number of values they can encode, \( N = 2^{b} \).