Processing math: 100%
Skip to main content
Library homepage
 

Text Color

Text Size

 

Margin Size

 

Font Type

Enable Dyslexic Font
Engineering LibreTexts

11.2: Merkle-Damgård Construction

( \newcommand{\kernel}{\mathrm{null}\,}\)

Building a hash function, especially one that accepts inputs of arbitrary length, seems like a challenging task. In this section, we’ll see one approach for constructing hash functions, called the Merkle-Damgård construction.

Instead of a full-fledged hash function, imagine that we had a collision-resistant function whose inputs were of a single fixed length, but longer than its outputs. In other words, h:{θ,1}n+t{0,1}n, where t>0. We call such an h a compression function. This is not compression in the usual sense of the word - we are not concerned about recovering the input from the output. We call it a compression function because it "compresses" its input by t bits (analogous to how a pseudorandom generator "stretches" its input by some amount).

The following construction is one way to build a full-fledged hash function (supporting inputs of arbitrary length) out of such a compression function:

Construction 11.2 (Merkle-Damgård)

Let h:{0,1}n+t{0,1}n be a compression function. Then the Merkle-Damgård transformation of h is MDh:{0,1}{0,1}n, where:

fig-ch01_patchfile_01.jpg
Figure 11.2.1: Copy and Paste Caption here. (Copyright; author via source)

The idea of the Merkle-Damgård construction is to split the input x into blocks of size t. The end of the string is filled out with ϑs if necessary. A final block called the "padding block" is added, which encodes the (original) length of x in binary.

Example

Suppose we have a compression function h:{0,1}48{0,1}32, so that t=16. We build a Merkle-Damgård hash function out of this compression function and wish to compute the hash of the following 5-byte (40-bit) string:

fig-ch01_patchfile_01.jpg
Figure 11.2.1: Copy and Paste Caption here. (Copyright; author via source)

We must first padx appropriately (MDPAD(x)) :

  • Since x is not a multiple of t=16 bits, we need to add 8 bits to make it so.
  • Since |x|=40, we need to add an extra 16-bit block that encodes the number 40 in binary(101000)

After this padding, and splitting the result into blocks of length 16 , we have the following:

fig-ch01_patchfile_01.jpg
Figure 11.2.1: Copy and Paste Caption here. (Copyright; author via source)

The final hash of x is computed as follows:

fig-ch01_patchfile_01.jpg
Figure 11.2.1: Copy and Paste Caption here. (Copyright; author via source)

We are presenting a simplified version, in which MDh accepts inputs whose maximum length is 2t1 bits (the length of the input must fit into t bits). By using multiple padding blocks (when necessary) and a suitable encoding of the original string length, the construction can be made to accommodate inputs of arbitrary length (see the exercises).

The value y0 is called the initialization vector (IV), and it is a hard-coded part of the algorithm.

As discussed above, we will not be making provable security claims using the librarystyle definitions. However, we can justify the Merkle-Damgård construction with the following claim:

Claim 11.3

Suppose h is a compression function and MDh is the Merkle-Damgård construction applied to h. Given a collision x,x in MDh, it is easy to find a collision in h. In other words, if it is hard to find a collision in h, then it must also be hard to find a collision in MDh.

Proof

Suppose that x,x are a collision under Mh. Define the values x1,,xk+1 and y1,,yk+1 as in the computation of MDh(x). Similarly, define x1,,xk+1 and y1,,yk+1 as in the computation of MDh(x). Note that, in general, k may not equal k

Recall that: MDh(x)=yk+1=h(ykxk+1)MDh(x)=yk+1=h(ykxk+1) Since we are assuming MDh(x)=MDh(x), we have yk+1=yk+1. We consider two cases:

Case 1: If |x||x|, then the padding blocks xk+1 and xk+1 which encode |x| and |x| are not equal. Hence we have ykxk+1ykxk+1, so ykxk+1 and ykxk+1 are a collision under h and we are done.

Case 2: If |x|=|x|, then x and x are broken into the same number of blocks, so k=k. Let us work backwards from the final step in the computations of MDDh(x) and Mh(x).

We know that: yk+1=h(ykxk+1)=yk+1=h(ykxk+1) If ykxk+1 and ykxk+1 are not equal, then they are a collision under h and we are done. Otherwise, we can apply the same logic again to yk and yk, which are equal by our assumption.

More generally, if yi=yi, then either yi1xi and yi1xi are a collision under h (and we say we are "lucky"), or else yi1=yi1 (and we say we are "unlucky"). We start with the premise that yk=yk. Can we ever get "unlucky" every time, and not encounter a collision when propagating this logic back through the computations of MDh(x) and MDh(x) ? The answer is no, because encountering the unlucky case every time would imply that xi=xi for all i. That is, x=x. But this contradicts our original assumption that xx. Hence we must encounter some "lucky" case and therefore a collision in h.


This page titled 11.2: Merkle-Damgård Construction is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Mike Rosulek (Open Oregon State) via source content that was edited to the style and standards of the LibreTexts platform.

Support Center

How can we help?