Skip to main content
Engineering LibreTexts

12.3: Carter-Wegman MACs

  • Page ID
    7371
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    Suppose we construct an AE[AD] scheme using the encrypt-then-MAC paradigm. A good choice for the CPA-secure encryption scheme would be CBC mode; a good choice for the MAC scheme would be ECBC-MAC. Combining these two building blocks would result in an \(\mathrm{AE}[\mathrm{AD}]\) scheme that invokes the block cipher \(t\) wice for each plaintext block \(-\) once for the CBC encryption (applied to the plaintext) and once more for the ECBC-MAC (applied to that ciphertext block).

    Is it possible to achieve \(A E[A D]\) with less cost? In this section we will explore a more efficient technique for variable-length MACs, which requires only one multiplication operation per message block along with a single invocation of a block cipher.

    Universal Hash Functions

    The main building block in Carter-Wegman-style MACs is a kind of hash function called a universal hash function (UHF). While the name "universal hash function" sounds like it must be an incredibly strong primitive, a UHF actually gives a much weaker security guarantee than a collision-resistant or second-preimage-resistant hash function.

    Recall that \(\left(x, x^{\prime}\right)\) is a collision under salt \(s\) if \(x \neq x^{\prime}\) and \(H(s, x)=H\left(s, x^{\prime}\right)\). A universal hash function has the property that it is hard to find such a collision ...

    .. when \(x\) and \(x^{\prime}\) are chosen without knowledge of the salt,

    ... and when the attacker has only one attempt at finding a collision for a particular salt value. These constraints are equivalent to choosing the salt after \(x\) and \(x^{\prime}\) are chosen, and a collision should be negligibly likely under such circumstances.

    The definition can be stated more formally:

    Definition \(12.6\) (UHF)

    A hash function \(H\) with set of salts \(\mathcal{S}\) is called a universal hash function (UHF) if \(\mathcal{L}_{\text {uhf-real }}^{H} \approx \mathcal{L}_{\text {uhf-fake }}^{H}\), where:

    fig-ch01_patchfile_01.jpg
    Figure \(\PageIndex{1}\): Copy and Paste Caption here. (Copyright; author via source)

    This definition is similar in spirit to the formal definition of collision resistance (Definition 11.1). Just like that definition, this one is cumbersome to use in a security proof. When using a hash function, one typically does not explicitly check for collisions, but instead just proceeds as if there was no collision.

    In the case of UHFs, there is a different and helpful way of thinking about security Consider a "blind collision-resistance" game, where you try to find a collision under \(H\) without access to the salt, and even without seeing the outputs of H! It turns out that if \(H\) is a UHF, then it is hard to find collisions in such a game:

    Claim \(12.7\)

    If \(H\) is a UHF, then the following libraries are indistinguishable:

    fig-ch01_patchfile_01.jpg
    Figure \(\PageIndex{1}\): Copy and Paste Caption here. (Copyright; author via source)

    In these libraries, the calling program chooses inputs \(x\) to the UHF. The \(\mathcal{L}_{\text {bcr-real }}\) library maintains a private record of all of the \(x\) values and their hashes, in the form of a reverse lookup table. \(H_{\mathrm{inv}}[y]\) will hold the value \(x\) that was hashed to result in \(y\).

    If the calling program calls \(\operatorname{Test}(x)\) on a value that collides with a previous \(x^{\prime}\), then \(\mathcal{L}_{\text {bcr-real }}\) will respond with this \(x^{\prime}\) value (the purpose of this is just to be helpful to security proofs that use these libraries); otherwise it will respond with false, giving no information about \(s\) or \(H(s, x)\). The other library always responds with false. Hence, the two are indistinguishable only if finding collisions is hard.

    to-do

    Proof to come. It’s not hard but tedious.

    Constructing UHFs using Polynomials

    UHFs have much weaker security than other kinds of hashing, and they can in fact be constructed unconditionally. One of the mathematically simplest constructions has to do with polynomials.

    Claim \(12.8 \quad\)

    Let p be a prime and \(g\) be a nonzero polynomial with coefficients in \(\mathbb{Z}_{p}\) and degree at most . Then \(g\) has at most \(d\) zeroes from \(\mathbb{Z}_{p}\).

    This observation leads to a simple UHF construction, whose idea is to interpret the string \(x\) as the coefficients of a polynomial, and evaluate that polynomial at point \(s\) (the salt of the UHF). In more detail, let \(p\) be a prime with \(p>2^{\lambda}\), and let the salt \(s\) be a uniformly chosen element of \(\mathbb{Z}_{p}\). To compute the hash of \(x\), first split \(x\) into \(\lambda\)-bit blocks, which will be convenient to index as \(x_{d-1}\left\|x_{d-2}\right\| . . \| x_{0}\). Interpret each \(x_{i}\) as a number mod \(p\). Then, the value of the hash \(H(s, x)\) is:

    \[s^{d}+x_{d-1} s^{d-1}+x_{d-2} s^{d-2}+\cdots+x_{0}(\bmod p)\]

    This is the result of evaluating a polynomial with coefficients \(\left(1, x_{d-1}, x_{d-2}, \ldots, x_{0}\right)\) at the point \(s\). A convenient way to evaluate this polynomial is by using Horner’s rule:

    \[\cdots s \cdot\left(s \cdot\left(s+x_{d-1}\right)+x_{d-2}\right)+x_{d-3} \cdots\]

    Horner’s rule can be expressed visually as follows:

    fig-ch01_patchfile_01.jpg
    Figure \(\PageIndex{1}\): Copy and Paste Caption here. (Copyright; author via source)

    The UHF construction is described formally below.

    Construction \(12.9\) (Poly-UHF)
    fig-ch01_patchfile_01.jpg
    Figure \(\PageIndex{1}\): Copy and Paste Caption here. (Copyright; author via source)
    Claim \(12.10\)

    The Poly-UHF construction is a secure UHF.

    Proof

    It suffices to show that, for any \(x \neq x^{\prime}\), the probability that \(H(s, x)=H\left(s, x^{\prime}\right)\) (taken over random choice of \(s)\) is negligible. Note that \(H(s, x)=g(s)\), where \(g\) is a polynomial whose coefficients are \(\left(1, x_{d-1}, \ldots, x_{0}\right)\), and \(H\left(s, x^{\prime}\right)=g^{\prime}(s)\), where \(g^{\prime}\) is a similar polynomial derived from \(x^{\prime}\). Note that \(x\) and \(x^{\prime}\) may be split into a different number of blocks, leading to different degrees \(\left(d\right.\) and \(\left.d^{\prime}\right)\) for the two polynomials.

    In order to have a collision \(H(s, x)=H\left(s, x^{\prime}\right)\), we must have

    \[\begin{array}{r} g(s) \equiv{ }_{p} g^{\prime}(s) \\ \Longleftrightarrow g(s)-g^{\prime}(s) \equiv_{p} 0 \end{array}\]

    Note that the left-hand side in this equation is a polynomial of degree at most \(d^{*}=\) \(\max \left\{d, d^{\prime}\right\}\). Furthermore, that polynomial \(g-g^{\prime}\) is not the zero polynomial because \(g\) and \(g^{\prime}\) are different polynomials. Even if the original strings \(x\) and \(x^{\prime}\) differ only in blocks of 0 s, the resulting \(g\) and \(g^{\prime}\) will be different polynomials because they include an extra leading coefficient of \(1 .\)

    A collision happens if and only if \(s\) is chosen to be one of the roots of \(g-g^{\prime}\). From Claim \(12.8\), the polynomial has at most \(d^{*}\) roots, so the probability of choosing one of them is at most: \[d^{*} / p \leqslant d^{*} / 2^{\lambda}\] This probability is negligible since \(d^{*}\) is polynomial in \(\lambda\) (it is the number of blocks in a string that was written down by the attacker, who runs in polynomial time in \(\lambda\)).

    to-do

    Fine print: this works but modular multiplication is not fast. If you want this to be fast, you would use a binary finite field. It is not so bad to describe what nite elds are, but doing so involves more polynomials. Then when you make polynomials whose coecients are nite eld elements, it runs the risk of feeling like polynomials over polynomials (because at some level it is). Not sure how I will eventually deal with this.

    Carter-Wegman UHF-based MAC

    A UHF by itself is not a good MAC, even when its salt \(s\) is kept secret. This is because the security of a MAC must hold even when the attacker sees the function’s outputs, but a UHF provides security (blind collision-resistance) only when the attacker does not see the UHF outputs.

    The Carter-Wegman MAC technique augments a UHF by sending its output through a PRF, so the MAC of \(m\) is \(F(k, H(s, m))\) where \(H\) is a UHF and \(F\) is a PRF.

    Construction \(12.11\) (Carter-Wegman)

    Let He a UHF with \(n\) bits of output, and let \(F\) be a secure PRF with in \(=n\). The Carter-Wegman construction combines them as follows:

    fig-ch01_patchfile_01.jpg
    Figure \(\PageIndex{1}\): Copy and Paste Caption here. (Copyright; author via source)

    We will show that the Carter-Wegman construction is a secure PRF. Recall that this implies that the construction is also a secure MAC (Claim 10.4). Note that the CarterWegman construction also uses a PRF as a building block. However, it uses a PRF for short messages, to construct a PRF for arbitrary-length messages. Furthermore, it only calls the underlying PRF once, and all other computations involving the UHF are comparitively "cheap." To understand the security of Carter-Wegman, we work backwards. The output \(F(k, H(s, x))\) comes directly from a PRF. These outputs will look random as long as the inputs to the PRF are distinct. In this construction, the only way for PRF inputs to repeat is for there to be a collision in the UHF H. However, we have to be careful. We can only reason about the collision-resistance of \(H\) when its salt is secret and its outputs are hidden from the attacker. The salt is indeed hidden in this case (kept as part of the Carter-Wegman key), but its outputs are being used as PRF inputs. Fortunately, the guarantee of a PRF is that its outputs appear unrelated to its inputs. In other words, the PRF outputs leak no information about the PRF inputs ( \(H\)-outputs). Indeed, this appears to be a situation where the UHF outputs are hidden from the attacker, so we can argue that collisions in \(H\) are negligibly likely.

    Claim 12.12

    If is a secure UHF and \(F\) is a secure PRF, then the Carter-Wegman construction (Construction 12.11) is a secure PRF, and hence a secure MAC as well.

    Proof

    We will show that \(\mathcal{L}_{\text {prf-real }}^{\mathrm{CW}} \approx \mathcal{L}_{\text {prf-rand }}^{\mathrm{CW}}\) using a standard hybrid technique.

      The starting point is \(\mathcal{L}_{\text {prf-real }}^{C \mathrm{CW}}\)
      We have applied the security of \(F\), by factoring out in terms of \(\mathcal{L}_{\text {prf-real }}^{F}\), replacing it with \(\mathcal{L}_{\text {prf-rand }}^{F}\), and inlining the result.
      The LOOKUP subroutine has the property that if it is called on the same \(x\) twice, it will return the same result. It therefore does no harm to cache the answer every time. The second time LOOKUP is called on the same value \(x\), the previous value is loaded from cache rather than recomputed. This change has no effect on the calling program.
      Note that if LOOKUP is first called on \(x^{\prime}\) and then later on \(x\), where \(H(s, x)=H\left(s, x^{\prime}\right)\), LOOKUP will return the same result. We therefore modify the library to keep track of \(H\)-outputs and inputs. Whenever the library computes \(y=H(s, x)\), it stores \(H_{\mathrm{inv}}[y]=x\). However, if \(H_{\mathrm{inv}}[y]\) already exists, it means that this \(x\) and \(x^{\prime}=H_{\mathrm{inv}}[y]\) are a collision under \(H\). In that case, the library directly returns whatever it previously returned on input \(x^{\prime}\). This change has no effect on the calling program.
      In the previous hybrid, \(T[y]\) is set at the same time \(H_{\mathrm{inv}}[y]\) is set \(-\) on the first call LOOKUP \((x)\) such that \(H(s, x)=\) \(y\). Therefore, it has no effect on the calling program to check whether \(T[y]\) is defined or check whether \(H_{\mathrm{inv}}[y]\) is defined.
      Note that if \(H_{\mathrm{inv}}[y]\) is defined, then LOOKUP returns within that if-statement. The line \(\operatorname{cache}[x]:=T[y]\) is therefore only executed in the case that \(H_{\mathrm{inv}}[y]\) was not initially defined. Instead of choosing \(T[y]\) only to immediately assign it to cache \([x]\), we just assign directly to cache \([x]\). This change has no effect on the calling program, and it does away with the \(T\) associative array entirely.

    The if-statements involving \(H_{\mathrm{inv}}\) in this hybrid are checking whether \(x\) has collided with any previous \(x^{\prime}\) under \(H\). All of this logic, including the evaluation of \(H\), can be factored out in terms of \(\mathcal{L}_{\text {bcr-real }}^{H}\). At this point in the sequence of hybrids, the output of \(H\) is not needed, except to check whether a collision has been encountered (and if so, what the offending inputs are). Again, this change has no effect on the calling program. The result is:

    fig-ch01_patchfile_01.jpg
    Figure \(\PageIndex{1}\): Copy and Paste Caption here. (Copyright; author via source)

    The security of \(H\) is that we can swap \(\mathcal{L}_{\text {bcr-real }}^{H}\) for \(\mathcal{L}_{\text {bcr-fake }}^{H}\), with negligible effect on the calling program. Note that TEST algorithm in \(\mathcal{L}_{\text {bcr-fake }}\) always returns false. This leads us to simply remove the "if \(\operatorname{TEST}(x) \neq\) false" clause, resulting in the following:

    fig-ch01_patchfile_01.jpg
    Figure \(\PageIndex{1}\): Copy and Paste Caption here. (Copyright; author via source)

    Since this is exactly \(\mathcal{L}_{\text {prf-rand }}^{\mathrm{CW}}\), we are done. We have shown that \(\mathcal{L}_{\text {prf-rand }}^{\mathrm{CW}} \approx \mathcal{L}_{\text {prf-rand }}^{\mathrm{CW}}\).


    This page titled 12.3: Carter-Wegman MACs is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Mike Rosulek (Open Oregon State) via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.

    • Was this article helpful?