5 Distributions

5.1 Bernoulli Distribution (Binomial)

  • The Bernoulli distribution arises as the result of a binary outcome.
  • Bernoulli random variables take only the values 1 and 0, with probabilities of \(p\) and \(1-p\) respectively.
  • The mean of the Bernoulli random variable is \(p\)
  • The variance is \(p(1-p)\)

\[P(X=x) = p^x(1-p)^{1-x}\]

5.1.1 Binomial Trails

In specific, let \(X_1, ..., X_n\) be ii Bernoulli(\(p\)); then \(X=\sum_{i=1}^n X_i\) is a binomial random variable.

\[P(X=x) = \left(\begin{array}{cc} n \\ x \end{array}\right)p^x(1-p)^{1-x}\] \[\left(\begin{array}{cc} n \\ x \end{array}\right) = \frac{n!}{x!(n-x)!}\] Read

\[\left(\begin{array}{cc} n \\ 0 \end{array}\right) = \left(\begin{array}{cc} n \\ n \end{array}\right) = 1\]

5.1.2 Example

Suppose that a friend has 8 children, 7 of which are girls. If each gender has an independent 50% probability for each birth, whats the probability of getting 7 or more girls out of 8 births?

\[\left(\begin{array}{cc} 8 \\ 7 \end{array}\right)0.5^7(1-5)^1 + \left(\begin{array}{cc} 8 \\ 8 \end{array}\right)0.5^8(1-5)^0 = 0.04\]

## [1] 0.03515625
## [1] 0.03515625

5.2 Normal Distribution

  • Gaussian distribution with mean \(\mu\) variance \(\sigma^2\)
  • Approx 68%, 98% and 99% of the norm density lies between 1, 2 and three standard deviations from the mean, respectively.

\[(2\pi\sigma^2)^{\frac{-1}{2}} e^{\frac{-(x-\mu)^2}{2 2\sigma^2}}\] \[E[X] = \mu ~~~~~~~~~~~ Var(X) = \sigma^2\] If \(X \sim N(\mu, \sigma^2)\) then we can convert the into standard deviations from the mean by,

\[Z = \frac{X-\mu}{\sigma} \sim N(0,1)\]

If we add the mean and multiply the standard deviation by the number of standard deviations from the mean we can calculate the random normal \(X\).

\[X = \mu + \sigma Z \sim N(\mu, \sigma^2)\]

5.2.1 Examples

What is the 95ht percentile of a \(N(\mu, \sigma^2)\) distribution?

  • qnorm(.95, mean = mu, sd = sd)
  • \(\mu + 1.645\sigma\)

What is the probability that \(N(\mu, \sigma^2)\) random variable is larger than \(x\)?

  • pnorm(x, mean = mu, sd = sd)

Assume that the number of daily ad clicks for a company is (approximately) normally distributed with a mean of 1020 and a standard deviation of 50. What’s the probability of getting more than 1,160 clicks in a day?

  • pnorm(1160, 1020, 50, lower.tail = false)
  • \(\mu = 1020 ~~~ \sigma = 50 ~~~ x=1160\)
    • \(Z = \frac{x-\mu}{\sigma}= \frac{1160-1020}{50}=2.8\)
    • pnorm(2.8, lower.tail = FALSE)
    • \(\sim0.0025\) so not super likely…

Assume that the number of daily ad clicks for a company is (approximately) normally distributed with a mean of 1020 and a standard deviation of 50. What number of daily ad clicks would represent the one where 75% of days have fewer clicks? (assuming independent and identically distributed)

  • qnorm(0.75, mean = 1020, sd = 50)
  • \(\mu = 1020 ~~~ \sigma = 50 ~~~ x=1160\)
  • \(1054\)

5.3 Poisson Distribution

Used to model counts \[P(X=x; ~\lambda) = \frac{\lambda^x~e^-{\lambda}}{x!}\] Where:

  • \(x\) is a non-negative integer
  • \(\lambda\) is both the variance and the mean of this distribution
  • Both the variance and the mean HAVE TO BE THE SAME

5.3.1 Uses of the Poisson Distribution

  • Modeling count data
  • Modeling event-time or survival data
  • Modeling contingency tables
  • Approximating binomials when \(n\) is large and \(p\) is small

5.3.2 Rates and Poisson Random Variables

  • Poisson random variables are used to model rates
  • \(X \sim Poisson(\lambda t)\)
    • \(\lambda = E[\frac{X}{t}]\) is the executed count per unit time
    • \(t\) is the total monitoring time

5.3.3 Example

The number of people that show up to a bus stop is Poisson with a mean of 2.5 per hour. If watching the bus stop for 4 hos, what is the probability that 3 or fewer people show up for the whole time?

ppois(3, lambda = 2.5*4)