Skip to main content
Engineering LibreTexts

11: Hash Functions

  • Page ID
    7396
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    Suppose you share a huge file with a friend, but you are not sure whether you both have the same version of the file. You could send your version of the file to your friend and they could compare to their version. Is there any way to check that involves less communication than this?

    Let’s call your version of the file \(x\) (a string) and your friend’s version \(y\). The goal is to determine whether \(x=y\). A natural approach is to agree on some deterministic function \(H\), compute \(H(x)\), and send it to your friend. Your friend can compute \(H(y)\) and, since \(H\) is deterministic, compare the result to your \(H(x)\). In order for this method to be fool-proof, we need \(H\) to have the property that different inputs always map to different outputs \(-\) in other words, \(H\) must be injective (1-to-1). Unfortunately, if \(H\) is injective and \(H:\{0,1\}^{i n} \rightarrow\{0,1\}^{\text {out }}\) is injective, then out \(\geqslant\) in. This means that sending \(H(x)\) is no better/shorter than sending \(x\) itself!

    Let us call a pair \((x, y)\) a collision in \(H\) if \(x \neq y\) and \(H(x)=H(y)\). An injective function has no collisions. One common theme in cryptography is that you don’t always need something to be impossible; it’s often enough for that thing to be just highly unlikely. Instead of saying that \(H\) should have no collisions, what if we just say that collisions should be hard (for polynomial-time algorithms) to find? An \(H\) with this property will probably be good enough for anything we care about. It might also be possible to construct such an \(H\) with outputs that are shorter than its inputs!

    What we have been describing is exactly a cryptographic hash function. A hash function has long inputs and short outputs - typically \(H:\{0,1\}^{*} \rightarrow\{0,1\}^{n}\). Such an \(H\) must necessarily have many collisions. The security property of a hash function is that it is hard to find any such collision. Another good name for a hash function (which I just made up, and no one else uses) would be a "pseudo-injective" function. Although it is not injective, it behaves like one for our purposes.


    This page titled 11: Hash Functions is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Mike Rosulek (Open Oregon State) via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.

    • Was this article helpful?