Chapter 5: Hypothesis Testing and P-Values

Last updated
Save as PDF

Page ID: 99725

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

Here you can find a series of pre-recorded lecture videos that cover this content: https://youtube.com/playlist?list=PL...I4vAR1y-lP1Ees

Hypothesis Testing for a Single Mean for a Small Sample Size

The reason we are using these statistical tools is to make certain decisions regarding a measurand. One of the most common methods for doing this is hypothesis testing.

Typically we deal with two hypotheses:

• Null Hypothesis

– First step in hypothesis testing

– H₀: µ = µ₀where µ₀is some constant specific value

• Alternative Hypothesis

– Second step

– Choice should reflect on what we are attempting to show

∗ Two-tailed test: concerned with whether a population mean, µ is different from specific value µ₀, i.e. \(H_{a}: \mu \neq \mu_{0}\)

∗ Left-tailed test: concerned with whether a population mean is less than a specific value, \(H_{a}:
\mu < \mu_{0}\)

∗ Right-tailed test: concerned with whether a population mean is greater than a specific value, \(H_{a}: \mu > \mu_{0}\)

Procedure for Hypothesis Testing

(1) Define null hypothesis, H₀

(2) Define alternative hypothesis, H_a

(3) Define c% interval

(4) Calculate the value of t_expfrom the data

(5) Determine proper value of \(t_{\alpha, \nu}\) or \(t_{\frac{\alpha}{2}, \nu}\) using the degrees of freedom ν

(6) If t_expfalls in the reject H₀region, we reject H₀and accept the alternative hypothesis H_a

(7) If t_expfalls in the do not reject H₀region, we conclude that we do not have sufficient data to reject H₀at the level of confidence specified

Figure \(\PageIndex{1}\): Rejecting null hypothesis for two, right, and left tailed test.

Let’s start with a two-tailed test and ask ourselves the following question, does the PCM data from the example last lecture come from a population with a true mean of 2mg assuming a confidence level of 95%.

\begin{eqnarray}
H_{0}: \mu = 2.00 mg \\
H_{a}: \mu \neq 2.00 mg \\
t_{exp} = \frac{\overline{x} - \mu_{0}}{\frac{S_{x}}{\sqrt{n}}} = 0.99011 \\
t_{\alpha, \nu} = t_{0.025,17} = 2.11
\end{eqnarray}

We can look at our figure and we see that our t-value falls within the do-not reject Ho region. Does this match with result from R???

I don’t see any result only some mysterious thing called the P-value, but no worries we actually already intuitively know what this mysterious value is.

P-Values

Often if you have read scientific articles or if you have taken other statistics courses you may have heard of the term P-Value. This term is ubiquitous and is used much much more often than z or t-values. Simply put, the P-value is the probability of getting a result that is more extreme than the value that is actually observed. Let’s see how it is used in the context of our previous hypothesis tests, starting with a two-tailed test.

Our P-value effectively gives us the probability of measuring a value greater than the observed, i.e. the tail ends. The way that we interpret the P-value is if we are running a two-tailed confidence interval at 5% confidence if the P-value is greater than 2.5% or 0.025 then we fall in the Do-Not Reject H_oregime and if it less than that value then we fall in the Reject H_oregime.

So we have previously found that we fall in the Do Not Reject region and looking at our calculated P-value of 0.336 that falls in the Do Not Reject region which matches our result!!!

Another PCM Application

Using the data from the previous example does the sample come from a population whose true mean weight is greater than 1.99mg, assuming a confidence level of 99%?

\begin{eqnarray}
H_{0}: \mu = 1.99 mg \\
H_{a}: \mu > 1.99 mg \\
t_{exp} = \frac{\overline{x} - \mu_{0}}{\frac{S_{x}}{\sqrt{n}}} = 2.025 \\
t_{\alpha, \nu} = t_{0.01,17} = 2.567
\end{eqnarray}

Clearly since it is a right tailed test it does not fall in the do not reject H₀region. So we conclude with 99% confidence that the population mean was not significantly different than 1.99mg. Note the subtle distinction of the phrase was not significantly different than 1.99mg or we do not have sufficient data to reject H₀with 99% confidence. We are not saying that the population mean is 1.99mg.

If we examine this from a P-value perspective we find that our P-value is 0.02943 and that also falls in the Do Not Reject Region as well!!

Yet Another PCM Application

Using the data from the previous example does the sample come from a population whose true mean weight is less than 2.01mg, assuming a confidence level of 90%?

\begin{eqnarray}
H_{0}: \mu = 2.01 mg \\
H_{a}: \mu < 2.01 mg \\
t_{exp} = \frac{\overline{x} - \mu_{0}}{\frac{S_{x}}{\sqrt{n}}} = -0.0448035 \\
t_{\alpha, \nu} = t_{0.01,17} = -2.567
\end{eqnarray}

Clearly since it is a left tailed test it does not fall in the do not reject H₀region. So we conclude with 90% confidence that the population mean was not significantly different than 2.01mg.

If we examine this from a P-value perspective we find that our P-value is 0.4824 and that also falls in the Do Not Reject Region as well!!

Hypothesis Testing Single Mean for Large Sample Size

For larger sample sizes we follow the exact same procedure but we replace t_α_,νby z_αand t_expby:

\begin{equation}
z_{exp} = \frac{\overline{x} - \mu_{0}}{\frac{S_{x}}{\sqrt{n}}}
\end{equation}

However, we can just use the t-table and use the value for degrees of freedom that corresponds to the n > 30 scenario i.e. ν ≈∞.

Looking Back at Rolling Velocity

Does the sample of our rolling magnetic beads come from a population with a velocity less than 3.1 \(\frac{\mu m}{s}\) at a confidence level of 99%.

\begin{eqnarray}
H_{0}: \mu = 3.1 \frac{\mu m}{s} \\
H_{a}: \mu < 3.1 \frac{\mu m}{s} \\
t_{exp} = \frac{\overline{x} - \mu_{0}}{\frac{S_{x}}{\sqrt{n}}} = -1.3383\\
t_{0.01,|Inf} = -2.362
\end{eqnarray}

We can see we are in the Do Not Reject H₀and we see that the P-Value is 0.09151 so we can confirm that we are in the Do Not Reject H₀.

t-Test Comparison of Sample Means

To compare two samples solely based on their means:

\begin{equation}
t = \frac{\overline{x}_{1} - \overline{x}_{2}}{\sqrt{\bigg(\frac{S_{1}^{2}}{n_{1}}\bigg) + \bigg(\frac{S_{2}^{2}}{n_{2}}\bigg)}}
\end{equation}

where x₁,S₁, and n₁and x₂, S₂, and n₂are the mean, standard deviations,and sizes of the two samples. and the degrees of freedom can be approximated by:

\begin{equation}
\nu = \frac{[(\frac{S_{1}^{2}}{n_{1}}) + (\frac{S_{2}^{2}}{n_{2}}) ] ^{2}}{\frac{(\frac{S_{1}^{2}}{n_{1}})^{2}}{n_{1} - 1} + \frac{(\frac{S_{2}^{2}}{n_{2}})^{2}}{n_{2}-1}}
\end{equation}

and ν is rounded to nearest integer. If the value t falls into the interval \(\pm t_{\frac{\alpha}{2}, \nu}\) then the two means are not significantly difference at the chosen confidence level. One of the great things about this technique is that this equation is applicable for any combination comparing large and small samples sizes.

Are these Materials Significantly Stiffer

Conisder Material A with a Young's Modulus average of 302.6 GPa, measured 12 times, and a sample standard deviation of 1.27 GPa and Material B with a Young's Modulus average of 302.3 GPa, measured 15 times, and a sample standard deviation of 1.7 GPa.

\begin{eqnarray}
H_{0} = \overline{x}_{A} = \overline{x}_{B}\\
H_{a} = \overline{x}_{A} \neq \overline{x}_{B}\\
\nu \approx 25\\
t_{exp} = 0.547\\
t_{0.025,25} = 2.060
\end{eqnarray}

The value falls in the do not reject region so there is not a significant difference in the stiffness of Material A and B.

We can also perform a P-value analysis to a t-Test comparison of different samples as well you will see the R code can be simply adapted to examine different samples. Let’s simulate the example here and we find a P-value of 0.02194 so for this simulated example that would be a reject finding, again the difference here would be due to the simulated number values, as we can confirm the means and standard deviation values are slightly different. In fact if we were to re-run the analysis using our method by hand we can achieve the same result. Let’s try that really quickly.