Skip to main content
Engineering LibreTexts

11.1: Security Properties for Hash Functions

  • Page ID
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)

    Superficially, it seems like we have already given the formal definition of security: A hash function H is collision-resistant if no polynomial-time algorithm can output a collision under H. Unfortunately, this definition is impossible to achieve!

    Fix your favorite hash function H. We argued that collisions in Hdefinitely exist in a mathematical sense. Let x,x’ be one such collision, and consider the adversary ? that has x,x’ hard-coded and simply outputs them. This adversary runs in constant time and finds a collision in H. Of course, even though ? exists in a mathematical sense, it might be hard to write down the code of such an ? given H. But (and this is a subtle technical point!) security definitions consider only the running time of ?, and not the effort that goes into finding the source code of ?.2

    The way around this problem is to introduce some random choice made by the user of the hash function, who wants collisions to be hard to find. A hash function family is a set ℋ of functions, where each function H ∈ ℋ is a hash function with the same output length. We will require that collisions are hard to find, in a hash function chosen randomly from the family. This is enough to foil the hard-coded-collision distinguisher mentioned above. Think of a hash function family as having exponentially many functions in it — then no polynomial-time program can have a hard-coded collision for all of them.

    Now the difficulty of finding collisions rests in the random choice of functions. An adversary can know every fact about ℋ, it just doesn’t know which H ∈ ℋ it is going to be challenged on to find a collision. It’s similar to how the security of other cryptographic schemes rests in the random choice of key. But in this case there is no secrecy involved, only unpredictability. The choice of H is made public to the adversary.

    Note also that this definition is a mismatch to the way hash functions are typically used in practice. There, we usually do have a single hash function that we rely on and standardize. While it is possible to adapt the definitions and results in this lecture to the setting of fixed hash functions, it’s simpler to consider hash function families. If you’re having trouble connecting the idea of a hash function family to reality, imagine taking a standardized hash function like MD5 or SHA3 and considering the family of functions you get by varying the initialization parameters in those standards.

    Towards the Formal Definition

    The straight-forward way to define collision resistance of a hash family ℋ is to say that the following two libraries should be indistinguishable:


    Indeed, this is a fine definition of collision resistance, and it follows in the style of previous security definitions. The two libraries give different outputs only when called on (x,x’) where xx’ but H(x) = H(x’) — i.e., an H-collision.

    However, it turns out to not be a particularly convenient definition to use when proving security of a construction that involves a hash function as a building block. To see why, think back to the security of MACs. The difference between the two ℒmac-* libraries is in the verification subroutine VER. And indeed, constructions that use MACs as a building block often perform MAC verification. The libraries — in particular, their VER subroutine — are a good fit for how MACs are used.

    On the other hand, cryptographic constructions don’t usually explicitly test for collisions when using a hash function. Rather, they typically compute the hash of some value and move on with their business under the assumption that a collision has never been encountered. To model this in a security definition, we use the approach below:

    Definition \(\PageIndex{1}\)

    Letbe a family of hash functions. Thenis collision-resistant ifcr-real ≋ ℒcr-fake, where:



    • The two libraries have identical behavior, except in the event that ℒcr-fake triggers a “self destruct” statement. Think of this statement as an exception that kills the entire program, including the distinguisher. In the case that the distinguisher is killed in this way, we take its output to be 0. Suppose a distinguisher A always outputs 1 under normal, explosion-free execution. Since an explosion happens only in ℒcr-fake, A’s advantage is simply the probability of an explosion. So if the two libraries are supposed to be indistinguishable, then it must be that self destruct happens with only negligible probability.
    • In ℒcr-fake we have given the associative array “H−1” a somewhat suggestive name. During normal operation, H−1 [y] contains the unique value seen by the library whose hash is y. A collision happens when the library has seen two distinct values x and x’ that hash to the same y. When the library sees the second of these values x’, it computes H(x’) = y and discovers that H−1[y] already exists but is not equal to x’. This is the situation in which the library self destructs.
    • Why do we make the library self destruct when it sees a collision, rather than just returning some error indicator? The reason is simply that it’s easier to just assume there is no collision than to check the return value of HASH after each call.

    Think of ℒcr-real as a world in which you take hashes of things but you might see a collision and never notice. Then ℒcr-fake is a kind of thought- experiment in which some all-seeing judge ends the game immediately if there was ever a collision among the values that you hashed. The judge’s presence simplifies things a bit. There’s no real need to explicitly check for collisions yourself; you can just go about your business knowing that as long as the game is still going, the hash function is injective among all the values you’ve seen.

    • The libraries have no secrets! H is public (the adversary can freely obtain it through GETH). However, note that the library — not the adversary — chooses H. This random choice of H is the sole source of security.

    Since H is public, the adversary doesn’t really need to call the subroutine HASH(x) to compute H(x) — he could compute H(x) locally. Intuitively, the library is used in a security proof to model the actions of the “good guys” who are operating on the assumption that H is collision-resistant. If an adversary finds a collision but never causes the “good guys” to evaluate H on it, then the adversary never violates their security assumption.

    Other variants of collision-resistance

    There are some other variants of collision-resistance that are often discussed in practice. We don’t define them formally, but give the rough idea:

    Target Collision Resistance. Given H and H(x), where H ← ℋ and x ← {0,1} are chosen randomly, it should be infeasible to compute a value x’ (possibly equal to x) such that H(x) = H(x’).

    Second preimage resistance. Given H and x, where H ← ℋ and x ← {0,1} are chosen randomly, it should be infeasible to compute a value x’x such that H(x) = H(x’).

    These conditions are weaker than the condition of (plain) collision-resistance, in the sense that if ℋ is collision-resistant, then ℋ is also target collision-resistant and secondpreimage resistant. Hence, we focus on plain collision resistance in this course.

    2The reason we don’t define security this way is that as soon as someone does find the code of such an ?, the hash function H is “broken” forever. Nothing anyone does (like choosing a new key) can salvage it.

    This page titled 11.1: Security Properties for Hash Functions is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Mike Rosulek (Open Oregon State) .

    • Was this article helpful?