40 ROC Curves
40.1 Why a Curve?
In binary classification, you;re predicting one of two categories, such as live or dead, but your predictions are often quantitative.
- Probability of being alive
- Prediction scale from 1 to 10
- The cutoff you choose gives different results. (For example, if we list everyone with a chance of being alive above 40% as alive, there will be more results than if the cutoff value is 80%)
The ROC curve plots the True positive rate vs the False positive rate. (Sensitivity vs Specificity). This will define whether the algorithm is any good. The benchmark for this is the area under the curve.
- AUC of 0.5 is effectively random guessing for a binary classifier. Lower is worse.
- AUC = 1 is a perfect classifier
- In general an AUC above 0.8 is considered “Good” but make sure that you have an estimate of Bayes Optimal Error beforehand.
40.2 Cross Validation
40.2.1 Key Idea
- Accuracy on the training data (resubsitiution accuracy) is optimistic.
- a better estimate comes from an independent set (test set accuracy).
- But we can;t use the test set when building the model or it becomes part of the training set.
- So we estimate the test set accuracy with the training set.
Approach:
- Use the training set
- Split it into training/test sets
- Build a model on the training set
- Evaluate on the test set
- Repeat and average the estimated errors
Used for:
- Picking variables to include in the model
- Picking the type of prediction function to use
- Picking the parameters in the prediction function
- Comparing different predictors
40.3 Example:
library(caret)
library(tidyverse)
# Load data
data("swiss")
# Split the data into training and test set
set.seed(123)
training.samples <- swiss$Fertility %>%
createDataPartition(p = 0.8, list = FALSE)
train.data <- swiss[training.samples, ]
test.data <- swiss[-training.samples, ]
# Build the model
model <- lm(Fertility ~., data = train.data)
# Make predictions and compute the R2, RMSE and MAE
predictions <- model %>% predict(test.data)
data.frame( R2 = R2(predictions, test.data$Fertility),
RMSE = RMSE(predictions, test.data$Fertility),
MAE = MAE(predictions, test.data$Fertility))
## R2 RMSE MAE
## 1 0.5946201 6.410914 5.651552
## [1] 0.08800157
### 1. Try 'Leave One Out Cross Validation' ###
# Define training control using 'Leave One Out Cross Validation'
train.control <- trainControl(method = "LOOCV")
# Train the model
model_cross_val <- train(Fertility ~., data = swiss, method = "lm",
trControl = train.control)
# Summarize the results
print(model_cross_val)
## Linear Regression
##
## 47 samples
## 5 predictor
##
## No pre-processing
## Resampling: Leave-One-Out Cross-Validation
## Summary of sample sizes: 46, 46, 46, 46, 46, 46, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 7.738618 0.6128307 6.116021
##
## Tuning parameter 'intercept' was held constant at a value of TRUE
### 2. Try 'K-Folds Cross Validation' ###
# Define training control
train.control <- trainControl(method = "cv", number = 10)
# Train the model
model_k_fold <- train(Fertility ~., data = swiss, method = "lm",
trControl = train.control)
# Summarize the results
print(model_k_fold)
## Linear Regression
##
## 47 samples
## 5 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 42, 44, 42, 43, 41, 42, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 7.126707 0.6863589 6.046966
##
## Tuning parameter 'intercept' was held constant at a value of TRUE
### 3. Try 'Repeated K-Folds Cross Validation' ###
# Define training control
train.control <- trainControl(method = "repeatedcv",
number = 10, repeats = 3)
# Train the model
model_k_fold_rep <- train(Fertility ~., data = swiss, method = "lm",
trControl = train.control)
# Summarize the results
print(model_k_fold_rep)
## Linear Regression
##
## 47 samples
## 5 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 3 times)
## Summary of sample sizes: 43, 42, 42, 43, 41, 42, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 7.304991 0.7211256 6.030067
##
## Tuning parameter 'intercept' was held constant at a value of TRUE
40.4 Considerations
- For time series data, data must be used in chunks.
- For K-Fold cross validation
- Larger K = less bias, more variance
- Smaller K = more bias, less variance
- Random sampling must be done without replacement
- Random sampling with replacement is the bootstrap
- Underestimates the error
- Can be corrected but is complicated (0.632 Bootstrap)
- If you cross-validate to pick predictors, you must estimate errors on an independent dataset.