54 Boosting

54.1 Basic Idea

  1. Take lots of potentially weak predictors.
  2. Weigh them and add them up.
  3. Get a stronger predictor.
  • Start with a set of classifiers \(h_1, ..., h_k\)
    • Examples: All possible trees, all possible regression models, all possible cut-offs.
  • Create a classifier that combines classification functions \[f(x) = sgn(\sum_{t=1}^T \alpha_t h_t(x))\]
    • Goal is to minimise error (on training set)
    • Iterative, select one \(h\) at each step
    • Calculate weights based on errors
    • Upweigh missed classifications and select \(h\)
    • View Adaboost

54.2 Boosting in R

  • Boosting can be done with any subset of classifiers
  • One large subclass is gradient boosting
  • R has multiple boosting libraries. Differences include the choice of basic classification functions and combination rules.
    • gbm - boosting with trees
    • mboost - model based boosting
    • ada - statistical boosting based on additive logistic regression
    • gamBoost for boosting generalised additive models
  • Most of these are available in the caret package

54.3 Wage Example

## Stochastic Gradient Boosting 
## 
## 2102 samples
##    9 predictor
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 2102, 2102, 2102, 2102, 2102, 2102, ... 
## Resampling results across tuning parameters:
## 
##   interaction.depth  n.trees  RMSE      Rsquared   MAE     
##   1                   50      34.56805  0.3186570  23.32001
##   1                  100      34.03865  0.3286803  22.96212
##   1                  150      33.98890  0.3297850  22.98726
##   2                   50      33.98977  0.3315167  22.92748
##   2                  100      33.99139  0.3304338  23.01636
##   2                  150      34.09998  0.3274460  23.14241
##   3                   50      33.96126  0.3314366  22.95055
##   3                  100      34.15682  0.3258006  23.19512
##   3                  150      34.36910  0.3199571  23.40490
## 
## Tuning parameter 'shrinkage' was held constant at a value of 0.1
## 
## Tuning parameter 'n.minobsinnode' was held constant at a value of 10
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were n.trees = 50, interaction.depth =
##  3, shrinkage = 0.1 and n.minobsinnode = 10.