# Chapter 12: Hash Functions

- Page ID
- 7396

A **hash function** is any function that takes arbitrary-length input and has fixed-length output, so *H* : {0,1}^{∗} → {0,1}* ^{n}*. Think of

*H*(

*m*) as a “fingerprint” of

*m*. Calling

*H*(

*m*) a fingerprint suggests that different messages always have different fingerprints. But we know that can’t be true — there are infinitely many messages but only 2

*possible outputs of*

^{n}*H*.

^{1}

Let’s look a little closer. A true “unique fingerprinting function” would have to be *injective* (one-to-one). No hash function can be injective, since there must exist many *x* and *x’* with *x* ≠ *x’* but *H*(*x*) = *H*(*x’*). Let us call (*x,x’*) a **collision** under *H* if it has this property. We know that collisions must exist, but what if *the problem of finding a collision* was hard for polynomial-time programs? Recall that in this course we often don’t care whether something is impossible in principle — it is enough for things to be merely computationally difficult in this way. A hash function for which collision-finding is hard would effectively serve as an injective function for our purposes.

Roughly speaking, a hash function *H* is **collision-resistant** if no polynomial-time program can find a collision in *H*. Another good name for such a hash function might be “pseudo-injective.” In this chapter we discuss definitions and applications of collision resistance.

^{1}Somewhere out there is a pigeonhole with infinitely many pigeons in it.