# Chapter 2: Histograms, Statistical Measures, and Probability

- Page ID
- 99693

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Here you can find pre-recorded lecture videos that cover this topic here: https://youtube.com/playlist?list=PL...gEHsGBZJpQhGzn

__Sample, Population, and Distribution Error__

Before we start to analyze precision error we must understand two key concepts: i) **distribution of error **and ii) **population from which a sample is taken**.

• **Distribution of error: **characterizes the **probability **that an error of a given size will occur.

• **Population from which a sample is drawn: **experimentally we have a **limited set of observations, our sample, **from which we will infer the **characteristics of the larger population.**

__Understanding Sample Vs. Population: Bag of Marbles__

In any bag of marbles there will be a distribution of diameters. To estimate the mean diameter we can take a handful of marbles (**sample**) drawn from the bag (**population**).

No two handfuls will yield the exact same average value but **each should approximate the average of the population **to some level of uncertainty **if the sample is large enough to approximate the population. **The handful is a repeated sample whereas picking one marble would have been a single sample.

You can visually see this in the Mathematica Notebook simulation below where we create a population of marble diameters and take different sample sizes.

__Visualizing Data: Histograms__

One thing that you will immediately see here is that we are representing the data using a **histogram**. **Histograms **typically represent the frequency or number of times a measurand is measured. The y-axis typically represents frequency and the measurand value is on the x-axis. We will talk about distribution but the lower the frequency of a measurand the more rare it is to encounter such a measurand and the higher frequency the more likely it is to encounter a measurand.

Notice that the sample means vary from each sample grabbed from the bag and that as we grab larger and larger samples as you might anticipate we get a closer approximation to the true population mean.

__Sampling:__

• Sample of size *n *is drawn from a finite population of size *p*. We assume additional data cannot be added to the population and that *n << p*

• Finite number of items, *n*, is randomly drawn from a population of indefinite size and properties of population are inferred from the sample.

__Sample Vs. Population Parameters__

## Population Mean and Standard Deviation

Typically, the population mean, *µ*, is an unknown since the population is infinite and as experimentalists we can never know the full population distribution. However it would be calculated as follows

\begin{equation}

\mu = \frac{x_{1} + x_{2} + ... + x_{n}}{n} = \sum\limits_{i=1}^n \frac{x_{i}}{n}

\end{equation}

By averaging a **large sample **we are able to **estimate the true value of the population**. However, we will develop a much more systematic approach of estimating bounds on the true value of a population, coming soon....

## Standard Deviation

The deviation, *d *is the amount by which a single measurement deviates from the mean value of the population

\begin{equation}

d = x - \mu

\end{equation}

The mean squared deviation, *\(\sigma^2\),*^{ }is approximated by averaging the squared deviations of a very large sample:

\begin{equation}

\sigma \approx \sqrt{\frac{d_{1}^{2} + d_{2}^{2} + ... + d_{n}^{2}}{n}}

\end{equation}

*σ *is the **standard deviation **of the population and characterizes the deviation from the mean value and the **width of the Gaussian**, again more on this later.

## Sample Mean and Standard Deviation

As we have previously discussed **it is often impractical or at times impossible to work with an entire population**, instead as experimentalists we work with samples from a population and **we use average values from the sample to estimate the mean or standard deviation of the population.**

The **sample mean **is defined as *\(\overline{x}\)*

\begin{equation}

\overline{x} = \sum\limits_{i=1}^n \frac{x_{i}}{n} = \frac{x_{1} + x_{2} + ... + x_{n}}{n}

\end{equation}

and as we have previously discussed the **sample mean**, *\(\overline{x}\)*, can be used to approximate the **population mean**, *µ*, for large sample sizes, i.e. *n *≥ 30. Similarly we can calculate the **sample standard deviation**, *\(S_x\)*,

\begin{equation}

S_{x} = \sqrt{\frac{(x_{1} - \overline{x})^{2} + (x_{2} - \overline{x})^{2} + ... + (x_{n} - \overline{x})^{2}}{n-1}}

\end{equation}

which can be used to approximate the **population standard deviation**, *\(\sigma\)*. Just a quick reminder that *n *is the number of data points/measurements in the sample.

But let’s go back to this idea of a histogram and frequencies of encountering measurements because this reminds me of probabilities, let’s refresh our understanding of this very fundamental and important concept.

__Number of Possibilities__

Before looking at probabilities is is often very important to determine the number of possibilities in a given scenario. If we have sets of *A*_{1}*,A*_{2}*,...A _{k }*which contain, respectively,

*n*

_{1}

*,n*

_{2}

*,...n*elements then there are

_{k }*n*

_{1 }·

*n*

_{2 }· · ·

*n*ways of choosing

_{k }*A*

_{1}, then

*A*

_{2 }and finally

*A*.

_{k}Let’s do an example to put this into practice.

Consider the following scenario where I am trying to maximize the yield of a field where I am planting lettuce there are two options of fertilizer, F1 and F2, 4 blocks of land to try to plant (B1-B4), and 3 types of seeding density (S1-S3). How possibilities of planting combination can we observe?

Well, from the given example there are *n *= 2 for our F scenario, 4 for B, and 3 for S so applying this rule we have

\begin{equation}

2\cdot 4 \cdot 3 = 24

\end{equation}

How about another quick problem, I make a true-false exam (I will never do that by the way) with 15 questions, how many combinations of answers can we observe?

\begin{equation}

2^{15}=32768

\end{equation}

Lot of grading options for me.

This concept is closely related to the number of ways to arrange objects distinctly, which you may remember was called.....

__Permutations__

If *r *objects are chosen from a set of *n *** distinct **objects any particular arrangement or order of these objects is a

**permutation**. In such a scenario the number of permutations

**P**is defined as

\begin{equation}

_n P_r = \frac{n!}{(n-r)!}

\end{equation}

Let’s consider an example, where we have 15 candidates, how ways can we choose a president, vice president, chief of staff, and treasurer.

Well, what are the number of objects, *r*? Here we see that *r *= 4 and the set of distinct objects here is *n *= 15.

\begin{equation}

\frac{15!}{11!}=32760

\end{equation}

__Combinations__

If we care about selecting *r *objects that can be selected from *n *distinct objects the number of **combinations **of *n *objects taken *r *at a time is denoted by

\begin{equation}

_n C_r = \frac{n!}{r!(n-r)!}

\end{equation}

Let’s say I have a class of 23 students how many ways can we choose teams of 3?

Here we have that *n *= 23 and *r *= 3

\begin{equation}

\frac{23!}{3!20!} = 1771

\end{equation}

How would you expect the number of combinations to change if we increase the team size to 8?

\begin{equation}

\frac{23!}{8!15!}=490314

\end{equation}

__Probability__

Now that we have the number of possibilities and permutations defined we can now start to define **probability **(*P*) which is simply defined as

\begin{equation}

P=\frac{s}{m}

\end{equation}

where there are *m ***equally likely possibilities **and *s ***are the successful or favorable outcomes **therefore the probability of a success is defined as above.

**Let’s consider a single deck of completely randomly shuffled deck of cards with 52 cards in total. What is the probability of pulling a red Queen? Well here m = 52 and there are only two red queens in the deck so s = 2**

\begin{equation}

\frac{2}{52} = \frac{1}{26}

\end{equation}

__Probabilities: Mutually Exclusive Events__

There are some scenarios where we may have some probabilities of multiple events occurring for example, if I have events *A*_{1}*,A*_{2}*,...,A _{n }*which are all

**mutually exclusive events**in a sample space

*S*, then

\begin{equation}

P(A_1 \cup A_2 \cup ...\cup A_n) = P(A_1) + P(A_2) +....+P(A_n)

\end{equation}

**Let’s see how this can be applied to an example, continuing with our card example what is the probability of pulling a red card, club, or 2 of spades?**

Well these are all **mutually exclusive events**, lets make sure this is indeed the case which is valid here because pulling a red card, club, or 2 of spades are mutually exclusive and will not change the probabilities., so we need to look at each events probability which will be

\begin{equation}

\frac{1}{2}+\frac{1}{4}+\frac{1}{52} =0.7692308

\end{equation}

__Probabilities: Not Mutually Exclusive__

Now if *A *and *B *are any events in the sample space *S*, **not mutually exclusive **then

\begin{equation}

P(A \cup B) = P(A) + P(B) - P(A \cap B)

\end{equation}

We would get an erroneous result if we applied the previous framework but we must exclude any overlapping probabilities.

**Let’s consider this example, what is the probability of pulling a card that is red or an ace?**

So the probability of pulling a red card is \(\frac{1}{2}\), the probability of pulling an ace is \(\frac{1}{13}\), and the probability that we pull a red ace is \(\frac{1}{26}\) so we find that

\begin{equation}

\frac{1}{2}+\frac{1}{13}-\frac{1}{26} = 0.5384615

\end{equation}

__Probabilities: Conditional Probabilities__

There are also conditional probabilities for example if *A *and *B *are any events in space *S *and \(*P(B) \neq 0\)* then the conditional probability of *A *given *B *is

\begin{equation}

P (A | B) = \frac{ P(A \cap B)}{P(B)}

\end{equation

Let’s illustrate this with an example and keep going with our cards as long as we can, **so what is the conditional probability that we pull an ace given that it is also red?**

So the way that we can read this is given that

\begin{equation}

\frac{\frac{1}{26}}{\frac{1}{2}} = 0.07692308

\end{equation}

If we rearrange this conditioanl probability we can show that

\begin{eqnarray}

P(A \cap B) = P(A) \cdot P(B|A)\\

P(A \cap B) = P(B) \cdot P(A|B) \\

\end{eqnarray}

this is assuming that \(*P(A) \neq 0\)* and \(*P(B) \neq 0\)*. Then it follows that two events *A *and *B *will be independent events if and only if

\begin{equation}

P(A \cap B) = P(A) \cdot P (B)

\end{equation}

Let’s see if our card example was truly independent. We said that the probability for pulling a red ace was \(\frac{1}{26}\) and the probability for a card being red was \(P(A) = \frac{1}{2}\) and the probability of pulling an ace was \(P(B) = \frac{1}{13}\) so we find that

\begin{equation}

\frac{1}{26}=\frac{1}{13}=\frac{1}{26}

\end{equation}

**So they are independent!! **Now consider how this may change if the card is not replaced, a possible Pset question, hmmmmm???

__Bayes’ Theorem__

These multiplication rules for probabilities are extremely useful but we can also imagine several steps in calculating a probability so imagine *B*_{1}*,B*_{2}*,....,B _{n }*are

**mutually exclusive events**of which one must occur then the general expression which is sometimes referred to as the

**rule of elimination**or the

**rule of total probability**

\begin{equation}

P(A) = \sum^n_{i=1} P(B_i) \cdot P(A |B_i)

\end{equation}

We can further generalize this problem by instead invoking the general **Bayes’ theorem **where if *B*_{1}*,B*_{2}*,...B _{n }*are mutually exclusive events of which one must occur then

\begin{equation}

P(B_r|A) = \frac{P(B_r)\cdot P(A|B_r)}{\sum^n_{i=1} P(B_i) \cdot P(A|B_i)}

\end{equation}

for *r *= 1*,*2*,...,n*.

Let’s see if we pull one more card out of our sleeve, excuse the horrible pun, to create one more example...

A card is lost from a pack of 52 cards, from the remaining cards two cards are randomly drawn and found to be hearts, what is the probability that the last card is also a heart.

We can define the following variables where the lost card can either be a heart, club, spade, or diamond so we can set up events *B*_{1}, *B*_{2},*B*_{3}, and *B*_{4 }of losing a card of hearts, clubs, spades, and diamonds respectively. Here we can also state that *A *is the event of drawing two cards that are clubs after a card is lost.

We can then determine that \(P(B_1) = P(B_2) =P(B_3) =P(B_4) = \frac{1}{4}\).

We can also state that the probability of drawing two hearts given that the lost card is a heart \(P(A|B_1) = \frac{12}{51} \cdot \frac{11}{50}\).

The probability of drawing two hearts given the lost card is a club is \(P(A|B_2) = \frac{13}{51} \cdot \frac{12}{50}\). And we can also see that *\(P(A|B_2) =P(A|B_3) =P(A|B_4)\)*.

Now we can plug into our equation and we find that

\begin{equation}

P(B_1 |A) = \frac{P(B_1) \cdot P(A|B_1)}{P(B_1) \cdot P(A|B_1)+P(B_2) \cdot P(A|B_2)+P(B_3) \cdot P(A|B_3)+P(B_4) \cdot P(A|B_4)}

\end{equation}

and we find the probability is 0.22!!!

Look forward to some more of these problems in your Pset.