# Chapter 5: Hypothesis Testing and P-Values

- Page ID
- 99725

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Here you can find a series of pre-recorded lecture videos that cover this content: https://youtube.com/playlist?list=PL...I4vAR1y-lP1Ees

__Hypothesis Testing for a Single Mean for a Small Sample Size__

The reason we are using these statistical tools is to make certain decisions regarding a measurand. One of the most common methods for doing this is **hypothesis testing**.

Typically we deal with two hypotheses:

### • Null Hypothesis

– First step in hypothesis testing

– *H*_{0 }: *µ *= *µ*_{0 }where *µ*_{0 }is some constant specific value

### • Alternative Hypothesis

– Second step

– Choice should reflect on what we are attempting to show

∗ Two-tailed test: concerned with whether a population mean, *µ *is different from specific value *µ*_{0}, i.e. *\(H_{a}: \mu \neq \mu_{0}\)*

∗ Left-tailed test: concerned with whether a population mean is less than a specific value, *\(H_{a}:
\mu < \mu_{0}\)*

∗ Right-tailed test: concerned with whether a population mean is greater than a specific value, *\(H_{a}: \mu > \mu_{0}\)*

### Procedure for Hypothesis Testing

(1) Define null hypothesis, *H*_{0}

(2) Define alternative hypothesis, *H _{a}*

(3) Define c% interval

(4) Calculate the value of *t _{exp}_{ }*from the data

(5) Determine proper value of \(t_{\alpha, \nu}\) or \(t_{\frac{\alpha}{2}, \nu}\) using the degrees of freedom *ν*

(6) If *t _{exp}_{ }*falls in the reject

*H*

_{0 }region, we reject

*H*

_{0 }and accept the alternative hypothesis

*H*

_{a}(7) If *t _{exp}_{ }*falls in the do not reject

*H*

_{0 }region, we conclude that we do not have sufficient data to reject

*H*

_{0 }at the level of confidence specified

Let’s start with a two-tailed test and ask ourselves the following question, does the PCM data from the example last lecture come from a population with a true mean of 2mg assuming a confidence level of 95%.

\begin{eqnarray}

H_{0}: \mu = 2.00 mg \\

H_{a}: \mu \neq 2.00 mg \\

t_{exp} = \frac{\overline{x} - \mu_{0}}{\frac{S_{x}}{\sqrt{n}}} = 0.99011 \\

t_{\alpha, \nu} = t_{0.025,17} = 2.11

\end{eqnarray}

We can look at our figure and we see that our t-value falls within the do-not reject Ho region. Does this match with result from R???

I don’t see any result only some mysterious thing called the P-value, but no worries we actually already intuitively know what this mysterious value is.

__P-Values__

Often if you have read scientific articles or if you have taken other statistics courses you may have heard of the term **P-Value**. This term is ubiquitous and is used much much more often than z or t-values. Simply put, the **P-value is the probability of getting a result that is more extreme than the value that is actually observed**. Let’s see how it is used in the context of our previous hypothesis tests, starting with a two-tailed test.

Our P-value effectively gives us the probability of measuring a value greater than the observed, i.e. the tail ends. The way that we interpret the P-value is if we are running a two-tailed confidence interval at 5% confidence if the P-value is greater than 2.5% or 0.025 then we fall in the Do-Not Reject *H _{o }*regime and if it less than that value then we fall in the Reject

*H*regime.

_{o }So we have previously found that we fall in the Do Not Reject region and looking at our calculated P-value of 0.336 that falls in the Do Not Reject region which matches our result!!!

### Another PCM Application

Using the data from the previous example does the sample come from a population whose true mean weight is greater than 1.99mg, assuming a confidence level of 99%?

\begin{eqnarray}

H_{0}: \mu = 1.99 mg \\

H_{a}: \mu > 1.99 mg \\

t_{exp} = \frac{\overline{x} - \mu_{0}}{\frac{S_{x}}{\sqrt{n}}} = 2.025 \\

t_{\alpha, \nu} = t_{0.01,17} = 2.567

\end{eqnarray}

Clearly since it is a right tailed test it does not fall in the do not reject *H*_{0 }region. So we conclude with 99% confidence that the population mean was not significantly different than 1.99mg. Note the subtle distinction of the phrase was not significantly different than 1.99mg or we do not have sufficient data to reject H_{0 }with 99% confidence. We are **not **saying that the population mean is 1.99mg.

If we examine this from a P-value perspective we find that our P-value is 0.02943 and that also falls in the Do Not Reject Region as well!!

### Yet Another PCM Application

Using the data from the previous example does the sample come from a population whose true mean weight is less than 2.01mg, assuming a confidence level of 90%?

\begin{eqnarray}

H_{0}: \mu = 2.01 mg \\

H_{a}: \mu < 2.01 mg \\

t_{exp} = \frac{\overline{x} - \mu_{0}}{\frac{S_{x}}{\sqrt{n}}} = -0.0448035 \\

t_{\alpha, \nu} = t_{0.01,17} = -2.567

\end{eqnarray}

Clearly since it is a left tailed test it does not fall in the do not reject *H*_{0 }region. So we conclude with 90% confidence that the population mean was not significantly different than 2.01mg.

If we examine this from a P-value perspective we find that our P-value is 0.4824 and that also falls in the Do Not Reject Region as well!!

__Hypothesis Testing Single Mean for Large Sample Size__

For larger sample sizes we follow the exact same procedure but we replace *t _{α}_{,ν }*by

*z*and

_{α}_{ }*t*by:

_{exp}_{ }\begin{equation}

z_{exp} = \frac{\overline{x} - \mu_{0}}{\frac{S_{x}}{\sqrt{n}}}

\end{equation}

However, we can just use the t-table and use the value for degrees of freedom that corresponds to the *n > *30 scenario i.e. *ν *≈∞.

### Looking Back at Rolling Velocity

Does the sample of our rolling magnetic beads come from a population with a velocity less than 3.1 \(\frac{\mu m}{s}\) at a confidence level of 99%.

\begin{eqnarray}

H_{0}: \mu = 3.1 \frac{\mu m}{s} \\

H_{a}: \mu < 3.1 \frac{\mu m}{s} \\

t_{exp} = \frac{\overline{x} - \mu_{0}}{\frac{S_{x}}{\sqrt{n}}} = -1.3383\\

t_{0.01,|Inf} = -2.362

\end{eqnarray}

We can see we are in the Do Not Reject *H*_{0 }and we see that the P-Value is 0.09151 so we can confirm that we are in the Do Not Reject *H*_{0}.

**t-Test Comparison of Sample Means**

**t-Test Comparison of Sample Means**

To compare two samples solely based on their means:

\begin{equation}

t = \frac{\overline{x}_{1} - \overline{x}_{2}}{\sqrt{\bigg(\frac{S_{1}^{2}}{n_{1}}\bigg) + \bigg(\frac{S_{2}^{2}}{n_{2}}\bigg)}}

\end{equation}

where *x*_{1}*,S*_{1}, and *n*_{1 }and *x*_{2}*, S*_{2}, and *n*_{2 }are the mean, standard deviations,and sizes of the two samples. and the degrees of freedom can be approximated by:

\begin{equation}

\nu = \frac{[(\frac{S_{1}^{2}}{n_{1}}) + (\frac{S_{2}^{2}}{n_{2}}) ] ^{2}}{\frac{(\frac{S_{1}^{2}}{n_{1}})^{2}}{n_{1} - 1} + \frac{(\frac{S_{2}^{2}}{n_{2}})^{2}}{n_{2}-1}}

\end{equation}

and *ν *is rounded to nearest integer. If the value *t *falls into the interval \(\pm t_{\frac{\alpha}{2}, \nu}\) then the two means are not significantly difference at the chosen confidence level. One of the great things about this technique is that this equation is applicable for any combination comparing large and small samples sizes.

### Are these Materials Significantly Stiffer

Conisder Material A with a Young's Modulus average of 302.6 GPa, measured 12 times, and a sample standard deviation of 1.27 GPa and Material B with a Young's Modulus average of 302.3 GPa, measured 15 times, and a sample standard deviation of 1.7 GPa.

\begin{eqnarray}

H_{0} = \overline{x}_{A} = \overline{x}_{B}\\

H_{a} = \overline{x}_{A} \neq \overline{x}_{B}\\

\nu \approx 25\\

t_{exp} = 0.547\\

t_{0.025,25} = 2.060

\end{eqnarray}

The value falls in the do not reject region so there is not a significant difference in the stiffness of Material A and B.

We can also perform a P-value analysis to a t-Test comparison of different samples as well you will see the R code can be simply adapted to examine different samples. Let’s simulate the example here and we find a P-value of 0.02194 so for this simulated example that would be a reject finding, again the difference here would be due to the simulated number values, as we can confirm the means and standard deviation values are slightly different. In fact if we were to re-run the analysis using our method by hand we can achieve the same result. Let’s try that really quickly.