Skip to main content
Engineering LibreTexts

19.5: Confidence versus Probability

  • Page ID
    48441
    • Eric Lehman, F. Thomson Leighton, & Alberty R. Meyer
    • Google and Massachusetts Institute of Technology via MIT OpenCourseWare
    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    So Chebyshev’s Bound implies that sampling 3,125 voters will yield a fraction that, 95% of the time, is within 0.04 of the actual fraction of the voting population who prefer Brown.

    Notice that the actual size of the voting population was never considered because it did not matter. People who have not studied probability theory often insist that the population size should influence the sample size. But our analysis shows that polling a little over 3000 people people is always sufficient, regardless of whether there are ten thousand, or a million, or a billion voters. You should think about an intuitive explanation that might persuade someone who thinks population size matters.

    Now suppose a pollster actually takes a sample of 3,125 random voters to estimate the fraction of voters who prefer Brown, and the pollster finds that 1250 of them prefer Brown. It’s tempting, but sloppy, to say that this means:

    False Claim. With probability 0.95, the fraction, \(p\), of voters who prefer Brown is \(1250/3125 \pm 0.04\). Since \(1250/3125 - 0.04 > 1/3\), there is a 95% chance that more than a third of the voters prefer Brown to all other candidates.

    What’s objectionable about this statement is that it talks about the probability or “chance” that a real world fact is true, namely that the actual fraction, \(p\), of voters favoring Brown is more than \(1/3\). But \(p\) is what it is, and it simply makes no sense to talk about the probability that it is something else. For example, suppose \(p\) is actually 0.3; then it’s nonsense to ask about the probability that it is within 0.04 of \(1250/3125\). It simply isn’t.

    This example of voter preference is typical: we want to estimate a fixed, unknown real-world quantity. But being unknown does not make this quantity a random variable, so it makes no sense to talk about the probability that it has some property.

    A more careful summary of what we have accomplished goes this way:

    We have described a probabilistic procedure for estimating the value of the actual fraction, \(p\). The probability that our estimation procedure will yield a value within 0.04 of \(p\) is 0.95.

    This is a bit of a mouthful, so special phrasing closer to the sloppy language is commonly used. The pollster would describe his conclusion by saying that

    At the 95% confidence level, the fraction of voters who prefer Brown is \(1250/3125 \pm 0.04\).

    So confidence levels refer to the results of estimation procedures for real-world quantities. The phrase “confidence level” should be heard as a reminder that some statistical procedure was used to obtain an estimate, and in judging the credibility of the estimate, it may be important to learn just what this procedure was.


    This page titled 19.5: Confidence versus Probability is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Eric Lehman, F. Thomson Leighton, & Alberty R. Meyer (MIT OpenCourseWare) .

    • Was this article helpful?