55 Model Based Prediction

55.1 Basic Idea

Assume the data follow a probabilistic model
Use Bayes’ theorem to identify optimal classifiers

Pros:

Can take advantage of structure of the data
May be computationally convenient
Are reasonably accurate on real problems

Cons:

Make additional assumptions about the data
When the model is incorrect, you may get reduced accuracy

55.2 Model Based Approach

Our goal is to build a parametric model for conditional distributions \(P(Y = k|X = x)\)
A typical approach is to apply Bayes Theorem

\[Pr(Y = k|X = x) = \frac{Pr(X = x|Y = k)~Pr(Y = k)}{\sum_{l=1}^K ~ Pr(X = x|Y = l) ~ Pr(Y = l)}\] \[Pr(Y = k|X = x) = \frac{f_k(x)\pi_k}{\sum_{l=1}^K f_l(x)\pi_l}\] Where:

\(f_k(x)\) represents the model for the \(x\) variables
\(\pi_k\) represents the model for the prior probabilities

Where:

\(Pr(Y = k|X = x)\) is the probability that \(Y = k\) given that \(X = x\)
\(Y\) is the outcome
\(k\) is the specific class given a particular set of predictor variables so that \(X\) is equal to the value of \(x\).

Typically prior probabilities \(\pi_k\) are set in advance.
A common choice for \(f_k(x)\) is a Gaussian distribution: \[f_k(x) = \frac{1}{\sigma_i \sqrt{2\pi}}e^{\frac{(x-\mu_k)^2}{\sigma_k^2}}\]
Estimate the parameters \((\mu_k, \sigma_k^2)\) from the data.
Once we have estimated these parameters, we can then classify the class as that with the highest value of \(P(Y = k|X = x)\)

55.3 Classifying Using the Model

A range of models use this approach:

Linear discriminant analysis assumes \(f_k(x)\) is a multivariate Gaussian with same covariances.
Quadratic discriminant analysis assumes \(f_k(x)\) is a multivariate Gaussian with different covariances
Model Based Prediction assumes more complicated versions for the covariance matrix
Naive Bayes assumes independence between features for model building

55.4 Why Linear Discriminant Analysis?

\[log \frac{Pr(Y = k|X = x)}{Pr(Y = j|X = x)}\] \[= log \frac{f_k(x)}{f_j(x)} + log \frac{\pi_k}{\pi_j}\] \[= log \frac{\pi_k}{\pi_j} - \frac{1}{2}(\mu_k + \mu_j)^T \sum^{-1}(\mu_k + \mu_j) + x^T \sum{-1}(\mu_k - \mu_j)\]

55.5 Decision Boundaries

Each ring below represents a different Gaussian distribution corresponding to a different class. The decision boundaries are created from the points of intersection between these Gaussian distributions.

55.6 Discriminant Function

\[\delta_k(x) = x^T \sum^{-1} \mu_k - \frac{1}{2} \mu_k \sum^{-1} \mu_k + log(\mu_k)\] Where:

\(\mu_k\) is the mean of the class \(k\) for all features
\(\sum^{-1}\) is the inverse of the covariance matrix
The observation is chosen by maximising the value of \(\delta_k(x)\)
Decide on class based on \(\hat{Y}(x) = argmax_k ~~\delta_k(x)\)
Usually estimate parameters with maximum likelihood.

55.7 Naive Bayes

Suppose we have many predictors, we would want to model: \(Pr(Y = k|X_1, ..., X_m)\)

We could use Bayes Theorem to get:

\[Pr(Y = k|X_1, ..., X_m) = \frac{\pi_k Pr(X_1, ..., X_m|Y = k)}{\sum_{l=1}^K Pr(X_1, ..., X_m|Y = k) \pi_l}\] \[\propto~~\pi_k Pr(X_1, ..., X_m|Y = k)\] This can be written:

\[P(X_1, ..., X_m|Y = k) = \pi_k P(X_1| Y =k) P(X_2, ..., X_m| X_1, Y = k)\] \[P\pi_k P(X_1| Y =k) P(X_2, ..., X_m| X_1, Y = k)P(X_3, ..., X_m|X_1, X_2, Y = k)\] \[\pi_k P(X_1| Y =k) P(X_2, ..., X_m| X_1, Y = k)...P(X_m|X_1, ..., X_{m-1}, Y = k)\] With assumptions, this simplifies to: \[\approx \pi_k P(X_1|Y = K)P(X_2|Y = K)...P(X_m|Y = K)\]

55.8 Example: Iris Data

library(klaR)

# Load data 
data(iris)
table(iris$Species)

## 
##     setosa versicolor  virginica 
##         50         50         50

# Create data partitions
inTrain <- createDataPartition(y = iris$Species, 
                               p = 0.7, list = FALSE)
training <- iris[inTrain, ]
testing <- iris[-inTrain, ]

dim(training); dim(testing)

## [1] 105   5

## [1] 45  5

# Create model using train function
modlda <- train(Species ~., data = training, method = "lda")
modnb <- train(Species ~., data = training, method = "nb")

# Predict with each model
plda <- predict(modlda, testing)
pnb <- predict(modnb, testing)

# View results 
table(plda, pnb)

##             pnb
## plda         setosa versicolor virginica
##   setosa         15          0         0
##   versicolor      0         14         0
##   virginica       0          1        15

# View differences between results
equalPredictions <- (plda == pnb)

qplot(Petal.Width, Sepal.Width, colour = equalPredictions, data = testing)