32 Poisson GLMs

  • Many data take the form of counts
    • Calls to a call centre
    • Number of flu cases in an area
    • Number of cars that cross a bridge
  • Data may be also in the form of rates
    • Percent of children passing a test
    • Percent of hits to a website from a country
  • Linear regression with a transformation is an option here

32.1 Posison Distribution

  • The Poisson distribution is a useful model for counts and rates
  • Here a rate is count per some monitoring time
  • Some examples uses of the Poisson distribution
    • Modeling web traffic hits
    • Incidence rates
    • Approximating binomial probabilities with small \(p\) and large \(n\)
    • Analyzing contigency table data

32.2 Poisson Mass Function

  • \(X \sim Poisson(t\lambda)\) if \[ P(X = x) = \frac{(t\lambda)^x e^{-t\lambda}}{x!} \] For \(x = 0, 1, \ldots\).
  • The mean of the Poisson is \(E[X] = t\lambda\), thus \(E[X / t] = \lambda\)
  • The variance of the Poisson is \(Var(X) = t\lambda\).
  • The Poisson tends to a normal as \(t\lambda\) gets large.

## [1] 3 3

32.3 Linear regression

\[ NH_i = b_0 + b_1 JD_i + e_i \]

\(NH_i\) - number of hits to the website

\(JD_i\) - day of the year (Julian day)

\(b_0\) - number of hits on Julian day 0 (1970-01-01)

\(b_1\) - increase in number of hits per unit day

\(e_i\) - variation due to everything we didn’t measure

  • Taking the natural log of the outcome has a specific interpretation.
  • Consider the model

\[ \log(NH_i) = b_0 + b_1 JD_i + e_i \]

\(NH_i\) - number of hits to the website

\(JD_i\) - day of the year (Julian day)

\(b_0\) - log number of hits on Julian day 0 (1970-01-01)

\(b_1\) - increase in log number of hits per unit day

\(e_i\) - variation due to everything we didn’t measure

32.4 Exponentiating Coefficients

  • \(e^{E[\log(Y)]}\) geometric mean of \(Y\).
    • With no covariates, this is estimated by: \[e^{\frac{1}{n}\sum_{i=1}^n \log(y_i)} = (\prod_{i=1}^n y_i)^{1/n}\]
  • When you take the natural log of outcomes and fit a regression model, your exponentiated coefficients estimate things about geometric means.
  • \(e^{\beta_0}\) estimated geometric mean hits on day 0
  • \(e^{\beta_1}\) estimated relative increase or decrease in geometric mean hits per day
  • There’s a problem with logs with you have zero counts, adding a constant works

32.5 Linear vs Poisson

Linear

\[ NH_i = b_0 + b_1 JD_i + e_i \]

or

\[ E[NH_i | JD_i, b_0, b_1] = b_0 + b_1 JD_i\]

Poisson/log-linear

\[ \log\left(E[NH_i | JD_i, b_0, b_1]\right) = b_0 + b_1 JD_i \]

or

\[ E[NH_i | JD_i, b_0, b_1] = \exp\left(b_0 + b_1 JD_i\right) \]

32.5.1 Multiplicitive Differences



\[ E[NH_i | JD_i, b_0, b_1] = \exp\left(b_0 + b_1 JD_i\right) \]



\[ E[NH_i | JD_i, b_0, b_1] = \exp\left(b_0 \right)\exp\left(b_1 JD_i\right) \]



32.6 Rates



\[ E[NHSS_i | JD_i, b_0, b_1]/NH_i = \exp\left(b_0 + b_1 JD_i\right) \]



\[ \log\left(E[NHSS_i | JD_i, b_0, b_1]\right) - \log(NH_i) = b_0 + b_1 JD_i \]



\[ \log\left(E[NHSS_i | JD_i, b_0, b_1]\right) = \log(NH_i) + b_0 + b_1 JD_i \]