# 21.2: Merge Sort

- Page ID
- 54599

Algorithms textbooks traditionally claim that sorting is an important, fundamental problem in computer science. Then they smack you with sorting algorithms until life as a disk-stacking monk in Hanoi sounds delightful. Here, we’ll cover just *one* well-known sorting algorithm, *Merge Sort.* The analysis introduces another kind of recurrence.

Here is how Merge Sort works. The input is a list of \(n\) numbers, and the output is those same numbers in nondecreasing order. There are two cases:

- If the input is a single number, then the algorithm does nothing, because the list is already sorted.
- Otherwise, the list contains two or more numbers. The first half and the second half of the list are each sorted recursively. Then the two halves are merged to form a sorted list with all \(n\) numbers.

Let’s work through an example. Suppose we want to sort this list:

10, 7, 23, 5, 2, 8, 6, 9.

Since there is more than one number, the first half (10, 7, 23, 5) and the second half (2, 8, 6, 9) are sorted recursively. The results are 5, 7, 10, 23 and 2, 6, 8, 9. All that remains is to merge these two lists. This is done by repeatedly emitting the smaller of the two leading terms. When one list is empty, the whole other list is emitted. The example is worked out below. In this table, underlined numbers are about to be emitted.

The leading terms are initially 5 and 2. So we output 2. Then the leading terms are 5 and 6, so we output 5. Eventually, the second list becomes empty. At that point, we output the whole first list, which consists of 10 and 23. The complete output consists of all the numbers in sorted order.

## Finding a Recurrence

A traditional question about sorting algorithms is, “What is the maximum number of comparisons used in sorting \(n\) items?” This is taken as an estimate of the running time. In the case of Merge Sort, we can express this quantity with a recurrence. Let \(T_n\) be the maximum number of comparisons used while Merge Sorting a list of \(n\) numbers. For now, assume that \(n\) is a power of 2. This ensures that the input can be divided in half at every stage of the recursion.

- If there is only one number in the list, then no comparisons are required, so \(T_1 = 0\).
- Otherwise, \(T_n\) includes comparisons used in sorting the first half (at most \(T_{n/2}\)), in sorting the second half (also at most \(T_{n/2}\)), and in merging the two halves. The number of comparisons in the merging step is at most \(n - 1\). This is because at least one number is emitted after each comparison and one more number is emitted at the end when one list becomes empty. Since \(n\) items are emitted in all, there can be at most \(n - 1\) comparisons.

Therefore, the maximum number of comparisons needed to Merge Sort \(n\) items is given by this recurrence:

\[\begin{aligned} T_1 &= 0 \\ T_n &= 2T_{n/2} + n - 1 & (\text{for } n \geq 2 \text{ and a power of } 2). \end{aligned}\]

This fully describes the number of comparisons, but not in a very useful way; a closed-form expression would be much more helpful. To get that, we have to solve the recurrence.

## Solving the Recurrence

Let’s first try to solve the Merge Sort recurrence with the guess-and-verify technique. Here are the first few values:

\[\begin{aligned} T_1 &= 0 \\ T_2 &= 2T_1 + 2 - 1 = 1 \\ T_4 &= 2T_2 + 4 - 1 = 5 \\ T_8 &= 2T_4 + 8 - 1 = 17 \\ T_{16} &= 2T_8 + 16 - 1 = 49. \end{aligned}\]

We’re in trouble! Guessing the solution to this recurrence is hard because there is no obvious pattern. So let’s try the plug-and-chug method instead.

**Step 1: Plug and Chug Until a Pattern Appears**

First, we expand the recurrence equation by alternately plugging and chugging until a pattern appears.

\[\begin{aligned} T_n &= 2T_{n/2} + n - 1 \\ &= 2(2T_{n/4} + n/2 - 1) + (n - 1) & \text{plug} \\ &= 4T_{n/4} + (n - 2) + (n - 1) & \text{chug} \\ &= 4(2T_{n/8} + n/4 - 1) + (n - 2) + (n - 1) & \text{plug} \\ &= 8T_{n/8} + (n - 4) - (n - 2) + (n - 1) & \text{chug} \\ &= 8(2T_{n/16} + n/8 - 1) + (n - 4) - (n - 2) + (n - 1) & \text{plug} \\ &= 16T_{n/16} + (n - 8) + (n - 4) - (n - 2) + (n - 1) & \text{chug} \end{aligned}\]

A pattern is emerging. In particular, this formula seems holds:

\[\begin{aligned} T_n &= 2^k T_{n/2^k} + (n - 2^{k-1}) + (n - 2^{k-2}) + \cdots (n - 2^0) \\ &= 2^k T_{n/2^k} + kn - 2^{k - 1} - 2^{k-2} \cdots - 2^0 \\ &= 2^k T_{n/2^k} + kn - 2^k + 1.\end{aligned}\]

On the second line, we grouped the \(n\) terms and powers of 2. On the third, we collapsed the geometric sum.

**Step 2: Verify the Pattern **

Next, we verify the pattern with one additional round of plug-and-chug. If we guessed the wrong pattern, then this is where we’ll discover the mistake.

\[\begin{aligned} T_n &= 2^k T_{n/2^k} + kn - 2^k + 1 \\ &= 2^k (2T_{n/2^{k+1}} + n/2^k - 1) + kn - 2^k + 1 & \text{plug} \\ &= 2^{k+1} T_{n/2^{k+1}} + (k + 1)n - 2^{k + 1} + 1. & \text{chug} \end{aligned}\]

The formula is unchanged except that \(k\) is replaced by \(k + 1\). This amounts to the induction step in a proof that the formula holds for all \(k \geq 1\).

Step 3: Write \(T_n\) Using Early Terms with Known Values

Finally, we express \(T_n\) using early terms whose values are known. Specifically, if we let \(k = \log n\), then \(T_{n/2^k} = T_1\), which we know is 0:

\[\begin{aligned} T_n &= 2^k T_{n/2^k} + kn - 2^k + 1 \\ &= 2^{\log n} T_{n / 2^{\log n}} + n \log n - 2^{\log n} + 1 \\ &= nT_1 + n \log n - n + 1 \\ &= n \log n - n + 1. \end{aligned}\]

We’re done! We have a closed-form expression for the maximum number of comparisons used in Merge Sorting a list of \(n\) numbers. In retrospect, it is easy to see why guess-and-verify failed: this formula is fairly complicated.

As a check, we can confirm that this formula gives the same values that we computed earlier:

\[\nonumber \begin{array}{c|c|c}

n & T_{n} & n \log n-n+1 \\

\hline 1 & 0 & 1 \log 1-1+1=0 \\

2 & 1 & 2 \log 2-2+1=1 \\

4 & 5 & 4 \log 4-4+1=5 \\

8 & 17 & 8 \log 8-8+1=17 \\

16 & 49 & 16 \log 16-16+1=49

\end{array}\]

As a double-check, we could write out an explicit induction proof. This would be straightforward, because we already worked out the guts of the proof in step 2 of the plug-and-chug procedure.