10 Hypothesis Tests

Deciding between two hypotheses is a core activity in scientific discovery. Statistical hypothesis testing is the formal inferential framework around choosing between hypotheses.

10.1 Hypothesis Testing

Hypothesis testing is concerned with making decisions using data.

A Null hypothesis is that represents the status quot, usually labeled \(H_0\).
The null hypothesis is assumed true and statistical evidence s required to reject it in favor of a research or alternative hypothesis.

10.2 Example

A respiratory disturbance index of more than 30 events per hour is considered evidence of severe sleep breathing (SDB).
Suppose that in a sample of 100 overweight subjects with other risk factors for sleep disordered breathing at a sleep clinic, the mean RDI was 32 events / hour with a standard deviation of 10 events / hour.
We might want to test the hypothesis that:
- \(H_0:\mu=30\)
- \(H_a:\mu>30\)
- Where \(\mu\) is the population mean RDI.
The alternative hypotheses are usually of the form \(<, >, \ne\)
Not that there are four possible outcomes of our statistical decision process.

\[H_0 = True. ~~~Decide ~H_0~~~ \therefore~ Correctly ~accept ~null\] \[H_a = True. ~~~Decide ~H_a~~~ \therefore~ Correctly ~reject ~null\] \[H_0 = True. ~~~Decide ~H_a~~~ \therefore~ Type ~I~Error \] \[H_a = True. ~~~Decide ~H_0~~~ \therefore~ Type~II~Error\]

10.3 Discussion

Consider a court of law, the null hypothesis is that the defendant is innocent
We require a standard on the available evidence to reject the null hypothesis (Convict)
If we set a low standard, then we would increase the percentage of innocent people convicted (type 1 errors); however, we would also increase the percentage of people convicted (correctly rejecting the null)
If we set a high standard, then we would increase the percentage of innocent people let free (correctly accepting the null) while we would also increase the percentage of guilty people let free (type 2 errors)

10.4 Our Last Example

A reasonable strategy would reject the null hypothesis if \(\bar{X}\) was larger than some constant, \(C\).
Typically, \(C\), is chosen so that the probability of a type 1 error, \(\alpha\), is 0.05 (Or some other relevant constant)
Standard error of the mean \(\frac{10}{\sqrt{100}}=1\)
Under the null hypothesis \(H_0~~\bar{X}\sim N(30,1)\)
We want to choose \(C\) so that the \(P(\bar{X} > C;~H_0)\) is 5%
The 95ht percentile of a normal distribution is 1.645 standard deviations from the mean
So if we set the value of the constant \(C = 30+ 1(1.645) = 31.645\) we are left with a cut point, so that the probability that a randomly drawn mean from this population is larger than this is 5%
This rule: “Reject \(H_0\) when \(\bar{X} \ge 31.645\)” has the property that the probability of rejection is 5% when \(H_0\) is true (for the \(\mu_0, \sigma\) and \(n\) given)

In general, we don’t convert \(C\) back to the original scale We would just reject because the Z-Score, which is how many standard errors the sample mean is above the hypothesized mean.

\[\frac{32-30}{\frac{10}{\sqrt{100}}}=2\] This is greater than 1.645. Or whenever \(\frac{\sqrt{n}(\bar{X}-\mu_0)}{s} > Z_{1-\alpha}\)

10.5 T-Tests

10.5.1 Example Reconsidered

Consider the last example, however, this time, that \(n=16\) (rather than 100)
\(H_0; \mu=30~~~H_a;\mu>30\)
The following statistic follows a \(T\) distribution with 15 sf user \(H_0\).

\[\frac{\bar{X}-30}{\frac{s}{\sqrt{16}}}\]

Under the \(H_0\), the probability that it is larger than the 95ht percentile of the \(T\) distribution is 5%.
The 95ht percentile of the T distribution with 15 sf is 1.7531 (obtained via qt(.95, 15))
So that out test statistic is now \(\frac{\sqrt{16}(32-30)}{10}=0.8\)
As 0.8 is not greater than qt(.95, 15) we correctly reject the alternative hypothesis.

10.5.2 Two Sided Tests

Suppose that we would reject the null hypothesis if in fact the mean was too large or too small.
That is, we want to test the alternative \(H_a : \mu \ne 30\).
We will reject if the test statistic, 0.8, is either too large or too small.
Then we want the probability of rejecting under then null to be 5%, split equally as 2.5% in the upper tail and 2.5% in the lower tail.
Thus we reject if our test statistic is larger than qt(.975, 15) or smaller than qt(.025, 15)
- This is the same as saying reject, if the absolute value of our statistic is larger than qt(.975, 15)=2.1314.
- So in this case, we also accept the two sided test as well.

10.5.3 T Test in R

library(UsingR); data(father.son)

t.test(father.son$sheight - father.son$fheight)

## 
##  One Sample t-test
## 
## data:  father.son$sheight - father.son$fheight
## t = 11.789, df = 1077, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.8310296 1.1629160
## sample estimates:
## mean of x 
## 0.9969728

10.5.4 Connections with Confidence Intervals

Consider testing \(H_0:\mu = \mu_0\) versus \(H_a:\mu\ne\mu_0\)
Take the set of all possible values for which you fail to reject \(H_0\), this set is a \((1-\alpha)100\%\) confidence interval for \(\mu\)
The same works in reverse; if a \((1-\alpha)100\%\) confidence interval contains \(\mu_0\), then we fail to reject \(H_0\)

10.5.5 Two Group Intervals

First, now you know how to do two group T tests since we already covered dependent group T intervals
Rejection rules are exactly the same
Test \(H_0:\mu_1=\mu_2\)

10.5.6 Example

This example will use the chick weight data

# First, we need to reformat the data using reshape2

# Load packages and dataset
library(datasets); data(""); library(reshape2)

wideCW <- dcast(ChickWeight, Diet + Chick ~ Time, value.var = "weight")

names(wideCW)[-(1:2)] <- paste("time", names(wideCW)[-(1:2)], sep = "")

library(dplyr)
wideCW <- mutate(wideCW, 
                 gain = time21 - time0
                 )

# Now we can perform an unequal variance T test comparing diets 1 and 4
wideCW14 <- subset(wideCW, Diet %in% c(1,4))
t.test(gain ~ Diet, paired = FALSE, 
       var.equal = TRUE, data = wideCW14)

## 
##  Two Sample t-test
## 
## data:  gain by Diet
## t = -2.7252, df = 23, p-value = 0.01207
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -108.14679  -14.81154
## sample estimates:
## mean in group 1 mean in group 4 
##        136.1875        197.6667