8 T Confidence Intervals
When we estimate something using statistics, usually that estimate comes with uncertainty. Take, for example, election polling. When we get a polled percentage of voters that favor a candidate, we were only able to sample a small subset of voters. Therefore, our estimate has uncertainty associated with it.
Confidence intervals are a convenient way to communicate that uncertainty in estimates.
In the previous section, we discussed creating a interval using the CLT.
- They took the form: \(Est \pm ZQ (SE_{Est})\)
- Here we will be looking at T Confidence intervals, so the only difference is that we’ll be looking at T quarantines instead of Z quarantines like before: \(Est \pm TQ (SE_{Est})\)
- The T quarantines have heavier tails than the normal quarantines, so the interval will be slightly larger. Theses are some of the most useful intervals in all of statistics.
- Whenever you have the option between using both a T interval or a Z interval, just go with the T interval, as you collect more and more data the T interval tends to the Z interval, however for a sample size \(n\) the Z interval does not take this into account.
\[Est \pm TQ (SE_{Est})\]
8.1 Gusset’s \(t\) distribution
- Invented by William Cosset in 1908.
- Has thicker tails than the normal.
- This distribution is indexed by degrees of freedom, it gets more and more like the standard normal as the degrees of freedom get larger.
- This distribution assumes that the underlying data are ii Gaussian with the result that:
\[\frac{\bar{X}-\mu}{S / \sqrt{n}}\] This follies Gusset’s \(t\) distribution with \(n-1\) degrees of freedom. Our Interval ends up becoming:
\[\bar{X} \pm t_{n-1} \frac{S}{\sqrt{n}}\]
8.2 Notes about the \(t\) interval
- The \(t\) distribution assumes that the data are ii normal, though it is robust to this assumption
- This interval will work fairly well whenever the data are symmetric and mound shaped
- Paired observations are often analyzed using the \(t\) interval by taking differences
- For large degrees of freedom, \(t\) quarantines become the same as standard normal quarantines; therefore this interval converges to the same interval as the CLT yielded
8.3 Sleep Data
Using the data(sleep)
command brings up the original sleep data used in Gusset’s barometric paper which shows the increase in hos for 1- patients on two soporific drugs. R treats the data as two groups rather than paired.
Below we will calculate the T interval for this data in a few different ways:
## extra group ID
## 1 0.7 1 1
## 2 -1.6 1 2
## 3 -0.2 1 3
## 4 -1.2 1 4
## 5 -0.1 1 5
## 6 3.4 1 6
## View results
# Grab the subjects 1-10 then the second observations for patients 1-10
g1 <- sleep$extra[1:10]
g2 <- sleep$extra[11:20]
# Take their difference and find the mean, sd
difference <- g2 - g1
mn <- mean(difference)
s <- sd(difference)
n <- 10
## Carry out the T test
# Manually
mn + c(-1, 1) * qt(.975, n-1) * s / sqrt(n)
## [1] 0.7001142 2.4598858
##
## One Sample t-test
##
## data: difference
## t = 4.0621, df = 9, p-value = 0.002833
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 0.7001142 2.4598858
## sample estimates:
## mean of x
## 1.58
# Or using the individual vectors + passing the argument paired = TRUE
t.test(g2, g1, paired = TRUE)
##
## Paired t-test
##
## data: g2 and g1
## t = 4.0621, df = 9, p-value = 0.002833
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.7001142 2.4598858
## sample estimates:
## mean of the differences
## 1.58
# Or pass it a type of model notation where you pass it the variable as a function of the releveled group variable + calling the paired = TRUE argument
t.test(extra ~ I(relevel(group, 2)), paired = TRUE, data = sleep)
##
## Paired t-test
##
## data: extra by I(relevel(group, 2))
## t = 4.0621, df = 9, p-value = 0.002833
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.7001142 2.4598858
## sample estimates:
## mean of the differences
## 1.58