43 Data Slicing

When using the createDataParition() function, the first argument you pass is what you want the function to “Split” on. Here we will set y to be the spam$type factor. The second argument tells the function what percentage of the data you want in the training set, in the code below it has been set to a \(75\% ~:~25\%\) split.

## [1] 3451   58
## [1] 1150   58

43.2 Spam Example: Resampling

If you want to resample or bootstrap instead of doing k-folds cross validation, you can use the createResample() function. The argument ‘times’ tells the function how many times you want to resample the data and the ‘list’ argument lets the function know what structure you would like the data to be outputted as.

## Resample01 Resample02 Resample03 Resample04 Resample05 Resample06 Resample07 
##       4601       4601       4601       4601       4601       4601       4601 
## Resample08 Resample09 Resample10 
##       4601       4601       4601
##  [1]  1  1  2  5  5  9 13 16 17 18

43.3 SPAM Example: Time Slices

If we want time slices for any particular reason, we can use the createTimeSlices() function to do that. The ‘initial Window’ argument lets the function know how long the slices will be (how many samples from our arbitrary time vector below) and the ‘horizon’ argument lets the function know how many samples we would like to predict using our window specified beforehand.

## [1] "train" "test"
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
##  [1] 21 22 23 24 25 26 27 28 29 30