59 Unsupervised Prediction

59.1 Key Ideas

  • Sometimes you don’t know the labels for prediction
  • To build a predictor, you need to:
    • Create clusters (Not a perfectly noiseless process)
    • Name clusters (With EDA this can be challenging)
    • Build predictor for clusters
  • In a new data set
    • Predict clusters

59.2 Irirs Example Ignoring Species Labels

## [1] 105   5
## [1] 45  5

##    
##     setosa versicolor virginica
##   1      0         32         9
##   2      0          3        26
##   3     35          0         0

59.2.1 Build a Predictor

This model will relate the clusters that we have previously created to the variables of the training set using a tree algorithm. There is error and variation in the prediction building as well as error and variation in the cluster building.

##    
##     setosa versicolor virginica
##   1      0         35        10
##   2      0          0        25
##   3     35          0         0

59.2.2 Apply on the Test Data Set

##                
## testClusterPred setosa versicolor virginica
##               1      0         15         6
##               2      0          0         9
##               3     15          0         0

59.3 Notes

  • The cl_predict() function in the clue package provides similar functionality
  • Beware of over-interpretation of clusters
  • This is one basic approach to recommendation engines
  • Elements of statistical learning covers this, so does introduction to statistical learning