13.2: The RSA Function

Last updated
Save as PDF

Page ID: 7398

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

The RSA function is defined as follows:

Let \(p\) and \(q\) be distinct primes (later we will say more about how they are chosen), and let \(N=p q . N\) is called the RSA modulus.
Let \(e\) and \(d\) be integers such that \(e d \equiv_{\phi(N)}\) 1. That is, \(e\) and \(d\) are multiplicative inverses \(\bmod \phi(N)-\operatorname{not} \bmod N!\)
The RSA function is: \(x \mapsto x^{e} \% N\), where \(x \in \mathbb{Z}_{N}\)
The inverse RSA function is: \(y \mapsto y^{d} \% N\), where \(x \in \mathbb{Z}_{N}\).

Essentially, the RSA function (and its inverse) is a simple modular exponentiation. The most confusing thing to remember about RSA is that \(e\) and \(d\) "live" in \(\mathbb{Z}_{\phi(N)}^{*}\), while \(x\) and \(y\) "live" in \(\mathbb{Z}_{N}\)

Let’s make sure the function we called the "inverse RSA function" is actually an inverse of the RSA function. Let’s start with an example:

Example

In Sage, we can sample a random prime between 1 and \(k\) by using random_prime \((k)\). We use it to sample the prime factors \(p\) and \(q\) :

Then we can compute the exponents \(e\) and . Recall that they must be multiplicative inverses \(\bmod \phi(N)\), so they cannot share any common factors with \(\phi(N)\). An easy way to ensure this is to choose e to be a prime:

We can now raise something to the e power and again to the \(d\) power:

As you can see, raising to the e power and thend power \((\bmod N)\) seems to bring us back to where we started \((x)\)

We can argue that raising-to-the-e-power and raising-to-the- \(d\)-power are inverses in general: Since \(e d \equiv \phi(N) 1\), we can write \(e d=t \phi(N)+1\) for some integer \(t\). Then:

\[\left(x^{e}\right)^{d}=x^{e d}=x^{t \phi(N)+1}=\left(x^{\phi(N)}\right)^{t} x \equiv_{N} \quad 1^{t} x=x\]

Note that we have used the fact that \(x^{\phi(N)} \equiv_{N} 1\) from Euler’s theorem. \({ }^{4}\)

How [Not] to Exponentiate Huge Numbers

When you see an expression like " \(x^{e} \% N "\), you might be tempted to implement it with the following algorithm:

While this algorithm would indeed give the correct answer, it is a really bad way of doing it. In practice, we use RSA with numbers that are thousands of bits long. Suppose we run the NAIVEEXPONENTIATE algorithm with arguments \(x, e\), and \(N\) which are around a thousand bits each (so the magnitude of these numbers is close to \(2^{1000}\) ):

The algorithm will spend approximately \(2^{1000}\) iterations in the for-loop!
The algorithm computes \(x^{e}\) as an integer first, and then reduces that integer mod \(N\). Observe that \(x^{2}\) is roughly 2000 bits long, \(x^{3}\) is roughly 3000 bits long, etc. So it would take about \(2^{1000} \cdot 1000\) bits just to write down the integer \(x^{e}\).

As you can see, there is neither enough time nor storage capacity in the universe to use this algorithm. So how can we actually compute values like \(x^{e} \% N\) on huge numbers?

Suppose you were given an integer \(x\) and were asked to compute \(x^{17}\). You can compute it as:\[x^{17}=\underbrace{x \cdot x \cdot x \cdots x}_{16 \text { multiplications }} .\] But a more clever way is to observe that: \[x^{17}=x^{16} \cdot x=\left(\left(\left(x^{2}\right)^{2}\right)^{2}\right)^{2} \cdot x\] This expression can be evaluated with only 5 multiplications (squaring is just muliplying a number by itself).
More generally, you can compute an expression like \(x^{e}\) by following the recurrence below. The method is called exponentiation by repeated squaring, for reasons that are hopefully clear:

Figure \(\PageIndex{1}\): Copy and Paste Caption here. (Copyright; author via source)

BETTEREXP divides the \(e\) argument by two (more or less) each time it recurses, until reaching the base case. Hence, the number of recursive calls is \(O(\log e)\). In each recursive call there are only a constant number of multiplications (including squarings). So overall this algorithm requires only \(O\) ( \(\log e\) ) multiplications (compared to \(e-1\) multiplications by just multiplying \(m\) by itself \(e\) times). In the case where \(e \sim 2^{1000}\), this means only a few thousand multiplications.
We care about only \(x^{e} \% N\), not the intermediate integer value \(x^{e}\). One of the most fundamental features of modular arithmetic is that you can reduce any intermediate values \(\bmod N\) if you care about the final result only \(\bmod N\).
Revisiting our previous example: \[x^{17} \% N=x^{16} \cdot x \% N=\left(\left(\left(x^{2} \% N\right)^{2} \% N\right)^{2} \% N\right)^{2} \cdot x \% N\] More generally, we can reduce all intermediate value \(\bmod N\) :

Figure \(\PageIndex{1}\): Copy and Paste Caption here. (Copyright; author via source)

This algorithm avoids the problem of computing the astronomically huge integer \(x^{e}\). It never needs to store any value (much) larger than \(N\).

Warning: Even this MODEXP algorithm isn’t an ideal way to implement exponentiation for cryptographic purposes. Exercise 13.10 explores some unfortunate properties of this exponentiation algorithm.

Example

Most math libraries implement exponentiation using repeated squaring. For example, you can use Sage to easily calculate numbers with huge exponents:

However, this expression still tells Sage to compute \(427^{31415926}\) as an integer, before reducing it mod 100. As such, it takes some time to perform this computation.

If you try an expression like \(\mathrm{x}^{\wedge} \mathrm{e} \% \mathrm{~N}\) with a larger exponent, Sage will give a memory error. How can we tell Sage to perform modular reduction at every intermediate step during repeated squaring? The answer is to use Sage’s Mod objects, for example:

This expression performs repeated squaring on the object Mod \((427,100)\). Since a Mod-object’s operations are all overloaded (to give the answer only mod n), this has the result of doing a modular reduction after every squaring and multiplication. This expression runs instantaneously, even with very large numbers.

Security Properties & Discussion

RSA is what is called a trapdoor function.

One user generates the RSA parameters (primarily \(N, e\), and \(d\) ) and makes \(N\) and \(e\) public, while keeping \(d\) private.
Functionality properties: Given only the public information \(N\) and \(e\), it is easy to compute the RSA function \(\left(x \mapsto x^{e} \% N\right)\). Given the private information \((d)\) it clearly easy to compute the RSA inverse \(\left(y \mapsto y^{d} \% N\right)\).
Security property: Given only the public information, it should be hard to compute the RSA inverse \(\left(y \mapsto y^{d} \% N\right)\) on randomly chosen values. In other words, the only person who is able to compute the RSA inverse function is the person who generated the RSA parameters.

to-do: The security property is not natural to express in our language of security denitions (libraries).

Currently the best known attacks against RSA (i.e., ways to compute the inverse RSA function given only the public information) involve factoring the modulus. If we want to ensure that RSA is secure as a trapdoor function, we must understand the state of the art for factoring large numbers.

Before discussing the performance of factoring algorithms, remember that we measure performance as a function of the length of the input - how many bits it takes to write the input. In a factoring algorithm, the input is a large number \(N\), and it takes roughly \(n=\log _{2} N\) bits to write down that number. We will discuss the running time of algorithms as a function of \(n\), not \(N\). Just keep in mind the difference in cost between writing down a 1000 -bit number \((n=1000)\) vs counting up to a 1000 -bit number \(\left(N=2^{1000}\right)\)

Everyone knows the "trial division" method of factoring: given a number \(N\), check whether \(i\) divides \(N\), for every \(i \in\{2, \ldots \sqrt{N}\}\). This algorithm requires \(\sqrt{N}=2^{n / 2}\) divisions in the worst case. It is an exponential-time algorithm since we measure performance in terms of the bit-length \(n\).

If this were the best-known factoring algorithm, then we would need to make \(N\) only as large as \(2^{256}\) to make factoring require \(2^{128}\) effort. But there are much better factoring algorithms than trial division. The fastest factoring algorithm today is called the Generalized Number Field Sieve (GNFS), and its complexity is something like \(O\left(n^{\left(\frac{n}{\log n}\right)^{\frac{1}{3}}}\right)\). This is not a polynomial-time algorithm, but it’s much faster than trial division.

Example

Sage can easily factor reasonably large numbers. Factoring the following 200-bit RSA modulus on my modest computer takes about 10 seconds:

As of January 2020 , the largest RSA modulus that has been (publically) factored is a 795-bit modulus. \({ }^{5}\) Factoring this number required the equivalent of \(900 \mathrm{CPU}\)-core-years, or roughly \(2^{66}\) total clock cycles.

All of this is to say, the numbers involved in RSA need to be quite large to resist factoring attacks (i.e., require \(2^{128}\) effort for state-of-the-art factoring algorithms). Current best practices suggest to use 2048 - or 4096 -bit RSA moduli, meaning that \(p\) and \(q\) are each 1024 or 2048 bits.

to-do: "What about quantum computers?" is a common FAQ that I should address here.

\({ }^{4}\) However, see Exercise \(13.15 .\)

\({ }^{5}\) https://en.Wikipedia.org/wiki/RSA_numbers#RSA-240