54 Boosting
54.1 Basic Idea
- Take lots of potentially weak predictors.
- Weigh them and add them up.
- Get a stronger predictor.
- Start with a set of classifiers \(h_1, ..., h_k\)
- Examples: All possible trees, all possible regression models, all possible cut-offs.
- Create a classifier that combines classification functions
\[f(x) = sgn(\sum_{t=1}^T \alpha_t h_t(x))\]
- Goal is to minimise error (on training set)
- Iterative, select one \(h\) at each step
- Calculate weights based on errors
- Upweigh missed classifications and select \(h\)
- View Adaboost
54.2 Boosting in R
- Boosting can be done with any subset of classifiers
- One large subclass is gradient boosting
- R has multiple boosting libraries. Differences include the choice of basic classification functions and combination rules.
- gbm - boosting with trees
- mboost - model based boosting
- ada - statistical boosting based on additive logistic regression
- gamBoost for boosting generalised additive models
- Most of these are available in the caret package
54.3 Wage Example
library(gbm)
# Load Data
data(Wage)
# Subset and build training and testing sets
Wage <- subset(Wage, select = -c(logwage))
inTrain <- createDataPartition(y = Wage$wage ,
p = 0.7, list = FALSE)
training <- Wage[inTrain, ]
testing <- Wage[-inTrain, ]
# Model wage as a combination of all remaining variables
modFit <- train(wage ~ ., method = "gbm",
data = training, verbose = FALSE)
print(modFit)
## Stochastic Gradient Boosting
##
## 2102 samples
## 9 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 2102, 2102, 2102, 2102, 2102, 2102, ...
## Resampling results across tuning parameters:
##
## interaction.depth n.trees RMSE Rsquared MAE
## 1 50 34.56805 0.3186570 23.32001
## 1 100 34.03865 0.3286803 22.96212
## 1 150 33.98890 0.3297850 22.98726
## 2 50 33.98977 0.3315167 22.92748
## 2 100 33.99139 0.3304338 23.01636
## 2 150 34.09998 0.3274460 23.14241
## 3 50 33.96126 0.3314366 22.95055
## 3 100 34.15682 0.3258006 23.19512
## 3 150 34.36910 0.3199571 23.40490
##
## Tuning parameter 'shrinkage' was held constant at a value of 0.1
##
## Tuning parameter 'n.minobsinnode' was held constant at a value of 10
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were n.trees = 50, interaction.depth =
## 3, shrinkage = 0.1 and n.minobsinnode = 10.
# View the results
qplot(predict(modFit, testing), wage, data = testing) +
geom_abline(slope = 1, col = "red")