Skip to main content
Engineering LibreTexts

16.4: The Birthday Principle

  • Page ID
    48418
    • Eric Lehman, F. Thomson Leighton, & Alberty R. Meyer
    • Google and Massachusetts Institute of Technology via MIT OpenCourseWare
    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    There are 95 students in a class. What is the probability that some birthday is shared by two people? Comparing 95 students to the 365 possible birthdays, you might guess the probability lies somewhere around \(1/4\)—but you’d be wrong: the probability that there will be two people in the class with matching birthdays is actually more than 0.9999.

    To work this out, we’ll assume that the probability that a randomly chosen student has a given birthday is \(1/d\). We’ll also assume that a class is composed of n randomly and independently selected students. Of course \(d = 365\) and \(n = 95\) in this case, but we’re interested in working things out in general. These randomness assumptions are not really true, since more babies are born at certain times of year, and students’ class selections are typically not independent of each other, but simplifying in this way gives us a start on analyzing the problem. More importantly, these assumptions are justifiable in important computer science applications of birthday matching. For example, birthday matching is a good model for collisions between items randomly inserted into a hash table. So we won’t worry about things like spring procreation preferences that make January birthdays more common, or about twins’ preferences to take classes together (or not).

    Exact Formula for Match Probability

    There are \(d^n\) sequences of \(n\) birthdays, and under our assumptions, these are equally likely. There are \(d(d-1)(d-2) \cdots (d-(n-1))\) length \(n\) sequences of distinct birthdays. That means the probability that everyone has a different birthday is:3

    \[\begin{align}
    \label{16.4.1} \nonumber &\dfrac{d(d-1)(d-2) \cdots(d-(n-1))}{d^{n}}\\
    &=\dfrac{d}{d} \cdot \dfrac{d-1}{d} \cdot \dfrac{d-2}{d} \cdots \dfrac{d-(n-1)}{d}\\
    \nonumber &=\left(1-\dfrac{0}{d}\right)\left(1-\dfrac{1}{d}\right)\left(1-\dfrac{2}{d}\right) \cdots\left(1-\dfrac{n-1}{d}\right)\\
    \nonumber &\left.<e^{0} \cdot e^{-1 / d} \cdot e^{-2 / d} \cdots e^{-(n-1) / d} \quad \text { (since } 1+x<e^{x}\right)\\
    \nonumber &=e^{-\left(\sum_{i=1}^{n-1} i / d\right)}\\
    \label{16.4.2} &=e^{-(n(n-1) / 2 d)}.
    \end{align}\]

    For \(n = 95\) and \(d = 365\), the value of (\ref{16.4.2}) is less than \(1/200,000\), which means the probability of having some pair of matching birthdays actually is more than \(1 - 1/200,000 > 0.99999\). So it would be pretty astonishing if there were no pair of students in the class with matching birthdays.

    For \(d \leq n^2 / 2\), the probability of no match turns out to be asymptotically equal to the upper bound (\ref{16.4.2}). For \(d \leq n^2 / 2\) in particular, the probability of no match is asymptotically equal to \(1/e\). This leads to a rule of thumb which is useful in many contexts in computer science:

    The Birthday Principle

    If there are \(d\) days in a year and \(\sqrt{2d}\) people in a room, then the probability that two share a birthday is about \(1 - 1/e \approx 0.632\).

    For example, the Birthday Principle says that if you have \(\sqrt{2 \cdot 365} \approx 27\) people in a room, then the probability that two share a birthday is about 0.632. The actual probability is about 0.626, so the approximation is quite good.

    Among other applications, it implies that to use a hash function that maps \(n\) items into a hash table of size \(d\), you can expect many collisions if \(n^2\) is more than a small fraction of \(d\). The Birthday Principle also famously comes into play as the basis of “birthday attacks” that crack certain cryptographic systems.

    3The fact that \(1 - x < e^{-x}\) for all \(x > 0\) follows by truncating the Taylor series \(e^{-x} = 1 - x + x^2/2! - x^3/3! + \cdots\). The approximation \(e^{-x} \approx 1 - x\) is pretty accurate when \(x\) is small.


    This page titled 16.4: The Birthday Principle is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Eric Lehman, F. Thomson Leighton, & Alberty R. Meyer (MIT OpenCourseWare) .

    • Was this article helpful?