52 Bootstrap Aggregating (Bagging)
The basic idea is that when you fit complected models, sometimes if you average those models together, you get smoother model fit, that gives you a better balance between potential bias in your fit and variance in your fit.
Basic Idea: 1. Resample cases and calculate predictions 2. Average or majority vote
Notes: * Similar bias * Reduced variance * More useful for non-linear functions
52.1 Example using Bagged Loess
The following code chunk will, create a matrix with 10 rows, that for each row, a subsample will be created with replacement. A new dataframe is created using the created subsets, then reordered in terms of ozone
## ozone radiation temperature wind
## 1 41 190 67 7.4
## 2 36 118 72 8.0
## 3 12 149 74 12.6
## 4 18 313 62 11.5
## 5 23 299 65 8.6
## 6 19 99 59 13.8
# Create a 10x155 matrix
ll <- matrix(NA, nrow=10, ncol=155)
for(i in 1:10) {
subset <- sample(1:dim(ozone)[1], replace=T)
ozone0 <- ozone[subset, ]
ozone0 <- ozone0[order(ozone0$ozone), ]
loess0 <- loess(temperature ~ ozone, data = ozone0, span = 0.2)
ll[i, ] <- predict(loess0, newdata = data.frame(ozone=1:155))
}
plot(ozone$ozone, ozone$temperature, pch = 19, cex = 0.5)
for(i in 1:10) { lines(1:155, ll[i, ], col="grey", lwd=2)}
lines(1:155, apply(ll, 2, mean), col="red", lwd=2)
Even though this bagging will reduce the variance between the individual model fits, the amount of bias is not reduced. A few examples of other subsamples are show below.
52.2 Bagging in Caret
Some models perform bagging for you. consider the train()
function, there are the method
options:
bagEarth
treebag
bagFDA
Alternatively, you can bag any model you choose, using the bag
function.