Skip to main content
Engineering LibreTexts

Chapter 12: Hash Functions

  • Page ID
    7396
  • A hash function is any function that takes arbitrary-length input and has fixed-length output, so H : {0,1} → {0,1}n. Think of H(m) as a “fingerprint” of m. Calling H(m) a fingerprint suggests that different messages always have different fingerprints. But we know that can’t be true — there are infinitely many messages but only 2n possible outputs of H.1

    Let’s look a little closer. A true “unique fingerprinting function” would have to be injective (one-to-one). No hash function can be injective, since there must exist many x and x’ with xx’ but H(x) = H(x’). Let us call (x,x’) a collision under H if it has this property. We know that collisions must exist, but what if the problem of finding a collision was hard for polynomial-time programs? Recall that in this course we often don’t care whether something is impossible in principle — it is enough for things to be merely computationally difficult in this way. A hash function for which collision-finding is hard would effectively serve as an injective function for our purposes.

    Roughly speaking, a hash function H is collision-resistant if no polynomial-time program can find a collision in H. Another good name for such a hash function might be “pseudo-injective.” In this chapter we discuss definitions and applications of collision resistance.


    1Somewhere out there is a pigeonhole with infinitely many pigeons in it.