9 Independent Group T-Intervals

Suppose that we want to compare the mean blood pressure between two groups in a randomized trail; those who received the treatment and those who received a placebo.
We cannot use the paired t test as the groups are independent and mat have different sample sizes.
Here, we will look at ways that we can compare across independent groups.

9.1 Confidence Interval

Therefore a \((1 - \alpha) \times 100\%\) confidence interval for \(\mu_y - \mu_x\) is:

\[\bar{Y} - \bar{X} \pm t_{n_x + n_y - 2, 1-\alpha/2} ~~S_p(\frac{1}{n_x} + \frac{1}{n_y})^{\frac{1}{2}}\] And the pooled variance is:

\[S_p^2 = \frac{(n_x-1)S_x^2 + (n_y-1)S_y^2}{n_x+n_y-2}\] Where:

\(\bar{Y} - \bar{X}\) is the average in one group minus the average in another group
The degrees of freedom are \(n_x+n_y -2\)
Standard error of the difference is \(S_p(\frac{1}{n_x} + \frac{1}{n_y})^{\frac{1}{2}}\)

Assuming that the variance in both groups is the same (which it should be fairly similar if the proper steps for randomization have been taken) then our estimate of the variance \(S_p^2\) can be thought of as the average of the variance between the two groups.

Remember, this interval is assuming constant variance across both groups
If there is some doubt, assume a different variance per group (we’ll get to that with a different interval later)

9.2 Example

Based on , Fundamentals of Bio statistics (Really good reference book)

Comparing systolic blood pressure for 8 oral contraceptive users vs 21 controls
\(\bar{X}_{OC} = 132.86\) with \(S_{OC} = 15.34\)
\(\bar{X}_{C} = 127.44\) with \(S_{OC} = 18.23\)

The pooled variance estimate for this would be: \[S_p^2 = \frac{(n_x-1)S_x^2 + (n_y-1)S_y^2}{n_x+n_y-2}\] And the interval is: \[\bar{Y} - \bar{X} \pm t_{n_x + n_y - 2, 1-\alpha/2} ~~S_p(\frac{1}{n_x} + \frac{1}{n_y})^{\frac{1}{2}}\]

# Calculate the pooled variance estimate 
# Then take the square root to get the estimate for the standard deviation 
sp <- sqrt((7 * 15.34^2 + 20*18.23^2)/(8+21-2))

# Calculate the independent t interval using the pooled variance estimate
# Using the c(-1,1) interval we can both add and subtract the quantiles
int <- 132.86 - 127.44 + (c(-1,1)*qt(.975, 27))*sp*(1/8 + 1/21)^.5

sp

## [1] 17.52656

int

## [1] -9.521097 20.361097

9.3 Example Chick Weight

With t.test(gain ~ Diet, ..., ...) notation, to correctly use the formula, you first need to specify the outcome variable and use tilde and the explanatory variable of interest. However, for this to work the explanatory variable must only have two levels. As there are four possible diets in this data set, we’ll use wideCW14 <- subset(wideCW, Diet %in% c(1, 4)) to specify only the 1st and 4ht diets for this t test.

# Load data and packages
library(datasets); data(""); library(reshape2); library(dplyr)

# Define weight gain or loss
wideCW <- dcast(ChickWeight, Diet + Chick ~ Time, value.var = "weight")

names(wideCW)[-(1:2)] <- paste("time", names(wideCW)[-(1:2)], sep = "")

wideCW <- mutate(wideCW, gain = time21 - time0)

wideCW14 <- subset(wideCW, Diet %in% c(1, 4))

# Compare the two t tests, with one falsely assuming equal variance
rbind(
  t.test(gain ~ Diet, paired = FALSE, var.equal = TRUE, data = wideCW14)$conf,
  t.test(gain ~ Diet, paired = FALSE, var.equal = FALSE, data = wideCW14)$conf
)

##           [,1]      [,2]
## [1,] -108.1468 -14.81154
## [2,] -104.6590 -18.29932

9.4 Dealing with Unequal Variances

The following, is the formula used for dealing with unequal variances:

\[\bar{Y} - \bar{X} \pm t_{df} ~~S_p(\frac{S_x^2}{n_x} + \frac{S_y^2}{n_y})^{\frac{1}{2}}\]

The following, is the formula for the degrees of freedom for the equation above:

\[df = \frac{(\frac{S_x^2}{n_x} + \frac{S_y^2}{n_y})^2}{\frac{(\frac{S_x^2}{n_x})^2}{n_x-1} + \frac{(\frac{S_y^2}{n_y})^2}{n_y-1}}\]

This is interesting as the degrees of freedom for this interval, depend on the estimated variance of each group.
If you’re not sure if your data has equal variances or not, just use the distribution for unequal variances as this will approximate the distribution if the data does infarct have equal variances.

9.5 Example

Lets redo the oral contraceptive example from earlier, however, this time, lets assume unequal variances:

Comparing systolic blood pressure for 8 oral contraceptive users vs 21 controls
\(\bar{X}_{OC} = 132.86\) with \(S_{OC} = 15.34\)
\(\bar{X}_{C} = 127.44\) with \(S_{OC} = 18.23\)
\(df = 15.04, ~~~t_{(15.04, ~.975)}=2.13\)

\[\bar{Y} - \bar{X} \pm t_{df} ~~S_p(\frac{S_x^2}{n_x} + \frac{S_y^2}{n_y})^{\frac{1}{2}}\]

\[132.86-127.44 \pm 2.13 (\frac{15.34^2}{8} + \frac{18.23^2}{21})^{\frac{1}{2}} = [-8.91, 19.75]\]

Or alternatively in R: t.test(..., var.equal = FALSE)

9.6 Comparing Other Kinds of Data

For Binomial data, there are lots of ways to compare two groups:

Relative risk, risk difference, odds ratio
Chi-squared tests, normal approximations, exact tests

For count data, there are also Chi-squared tests and exact tests.