30 Generalised Linear Models
30.1 Linear Models
- Transformations are often hard ot interpret
- There’s value in modeling the data on the scale that it was collected
- Particularly interpretable transformations, natural logarithms in specific, aren;t applicable for negative or zero values.
30.2 Generalised Linear Models
- An exponential family for the response
- This is a large family of distributions that includes the normal, binomial and poisson distributions. This is the random component
- A systematic componnet via a linear predictor
- This is part that we’re modeling
- A link function that connects the means of the response to the linear predictor
- This connects some important mean in the exponential family to the linear predictors
30.3 Example: Linear Models
Assume that: \(Y_i \tilde~ N(\mu_i, \sigma^2)\) (The gaussian distribution is an exponential family distribution.)
Define the linear predictor to be \(\eta_i = \sum_{k=1}^p X_{ik} \beta_k\) This is to say that we define the eta value to be the collection of covariant Xs times their coefficients.
In this case, our link function is going to be the identity link function. The link function \(g\) is defined so that \(g(\mu) = \eta\)
- For linear models \(g(\mu) = \mu\) so that \(\mu_i = \eta_i\).
- All this means is that the value for mu is exactly the same as the sum of the covariants times their coefficients
- Here we’ve stated that instead of our error being normally distributed, our \(Y_i\)s are normally distributed (as a consequence), we’ve also stated our linear predictor separately, then we connected the mean from the normal distribution to the linear predictor.
30.4 Example Logistic Regression
Assume that \(Y_i \tilde~ Bernoulli(\mu_i)\) so that \(E[Y_i] = \mu\) where \(0 \le \mu_i \le 1\). (Like you would use for modelling a coin flip)
Linear predictor is the same as before: \(\eta_i = \sum_{k=1}^p X_{ik} \beta_k\)
The link function \(g(\mu)\) in this case is the logistic link function: \[g(\mu) = \eta = log(\frac{\mu}{1-\mu})\] So how we get from the probability of a head, to our linear prdictr is with \(g\) the natrual log of the odds referred to as the logit. This means that we’re going to transform out mean (our probability of getting a head), then on that transform scale is where the covariance and their coefficients (the standard part of our linear model) is going to appear.
So we can write our likelihood as the binomial likelihood like this, where we maximise that likelihood to obtain our parameter estimates.
\[\prod_{i=1}^n \mu_i^{y_i}(1-\mu)^{1-y_i} = exp(\sum_{i=1}^n y_i \eta_i)\prod_{i=1}^n(1+\eta_i)^{-1}\]
30.5 Example Poisson Regression
Assume that \(Y_i \tilde~ Poisson(\mu_i)\) so that \(E[Y_i] = \mu_i\) where \(0\le\mu_i\)
Linear predictor is the same as before: \(\eta_i = \sum_{k=1}^p X_{ik} \beta_k\)
The link function in this case is the log link \(g(\mu) = \eta = log(\mu)\) which is the most common link function for poisson cases. This states that we go from the mean (mu) to the linear predictor (eta) by taking the log of mu.
We are not logging the data, we are logging the mean from the distribution that the data is asumed to have come from.
In each case, the only way in which the likelihood depends on the data is through: \[\sum_{i=1}^n y_i\eta_i =\sum_{i=1}^n y_i \sum_{k=1}^p X_{ik} \beta_k = \sum_{k=1}^p\beta_k \sum_{i=1}^p X_{ik} y_i\] \[0=\sum_{i=1}^n \frac{(Y_i-\mu_i)}{Var(Y_i)}W_i\] This is similar to the least squares case, however, there is now a Variance term and a set of associated weights.
30.6 About Variances
- For the linear model we had the assumption of constant variance \(Var(Y_i) = \sigma^2\) is constant.
- For Bernoulli case \(Var(Y_i) = \mu_i(1-\mu_i)\)
- For the Poisson case \(Var(Y_i) = \mu_i\)
As this is the case, generalised linear models often put a restriction on the values of both the mean and the variance and the relationship between the mean and the variance. This can be problematic when this does not hold for your particular data set.
This is why in R
threre are Quasi-functions like the quasi-binomial and the quasi-bernoulli functions that are slightly more flexible incase your data does not adhere to the GLM variance structure.
30.7 Details
These normal equations have to be solved iteratively. This results in \(\hat{\beta_k}\) and, if included \(\hat{\phi}\).
The linear predicted responses can be obtained as: \[\hat{\eta} =\sum_{k=1}^p X_{ik} \hat{\beta_k}\] Which will give you a result on the scale of whatever distibution you’re using. All you have to do is then reverse the link function to get the result in the same form as your original data.