5 Distributions
5.1 Bernoulli Distribution (Binomial)
- The Bernoulli distribution arises as the result of a binary outcome.
- Bernoulli random variables take only the values 1 and 0, with probabilities of \(p\) and \(1-p\) respectively.
- The mean of the Bernoulli random variable is \(p\)
- The variance is \(p(1-p)\)
\[P(X=x) = p^x(1-p)^{1-x}\]
5.1.1 Binomial Trails
In specific, let \(X_1, ..., X_n\) be ii Bernoulli(\(p\)); then \(X=\sum_{i=1}^n X_i\) is a binomial random variable.
\[P(X=x) = \left(\begin{array}{cc} n \\ x \end{array}\right)p^x(1-p)^{1-x}\] \[\left(\begin{array}{cc} n \\ x \end{array}\right) = \frac{n!}{x!(n-x)!}\] Read
\[\left(\begin{array}{cc} n \\ 0 \end{array}\right) = \left(\begin{array}{cc} n \\ n \end{array}\right) = 1\]
5.1.2 Example
Suppose that a friend has 8 children, 7 of which are girls. If each gender has an independent 50% probability for each birth, whats the probability of getting 7 or more girls out of 8 births?
\[\left(\begin{array}{cc} 8 \\ 7 \end{array}\right)0.5^7(1-5)^1 + \left(\begin{array}{cc} 8 \\ 8 \end{array}\right)0.5^8(1-5)^0 = 0.04\]
## [1] 0.03515625
## [1] 0.03515625
5.2 Normal Distribution
- Gaussian distribution with mean \(\mu\) variance \(\sigma^2\)
- Approx 68%, 98% and 99% of the norm density lies between 1, 2 and three standard deviations from the mean, respectively.
\[(2\pi\sigma^2)^{\frac{-1}{2}} e^{\frac{-(x-\mu)^2}{2 2\sigma^2}}\] \[E[X] = \mu ~~~~~~~~~~~ Var(X) = \sigma^2\] If \(X \sim N(\mu, \sigma^2)\) then we can convert the into standard deviations from the mean by,
\[Z = \frac{X-\mu}{\sigma} \sim N(0,1)\]
If we add the mean and multiply the standard deviation by the number of standard deviations from the mean we can calculate the random normal \(X\).
\[X = \mu + \sigma Z \sim N(\mu, \sigma^2)\]
5.2.1 Examples
What is the 95ht percentile of a \(N(\mu, \sigma^2)\) distribution?
qnorm(.95, mean = mu, sd = sd)
- \(\mu + 1.645\sigma\)
What is the probability that \(N(\mu, \sigma^2)\) random variable is larger than \(x\)?
pnorm(x, mean = mu, sd = sd)
Assume that the number of daily ad clicks for a company is (approximately) normally distributed with a mean of 1020 and a standard deviation of 50. What’s the probability of getting more than 1,160 clicks in a day?
pnorm(1160, 1020, 50, lower.tail = false)
- \(\mu = 1020 ~~~ \sigma = 50 ~~~ x=1160\)
- \(Z = \frac{x-\mu}{\sigma}= \frac{1160-1020}{50}=2.8\)
pnorm(2.8, lower.tail = FALSE)
- \(\sim0.0025\) so not super likely…
Assume that the number of daily ad clicks for a company is (approximately) normally distributed with a mean of 1020 and a standard deviation of 50. What number of daily ad clicks would represent the one where 75% of days have fewer clicks? (assuming independent and identically distributed)
qnorm(0.75, mean = 1020, sd = 50)
- \(\mu = 1020 ~~~ \sigma = 50 ~~~ x=1160\)
- \(1054\)
5.3 Poisson Distribution
Used to model counts \[P(X=x; ~\lambda) = \frac{\lambda^x~e^-{\lambda}}{x!}\] Where:
- \(x\) is a non-negative integer
- \(\lambda\) is both the variance and the mean of this distribution
- Both the variance and the mean HAVE TO BE THE SAME
5.3.1 Uses of the Poisson Distribution
- Modeling count data
- Modeling event-time or survival data
- Modeling contingency tables
- Approximating binomials when \(n\) is large and \(p\) is small
5.3.2 Rates and Poisson Random Variables
- Poisson random variables are used to model rates
- \(X \sim Poisson(\lambda t)\)
- \(\lambda = E[\frac{X}{t}]\) is the executed count per unit time
- \(t\) is the total monitoring time
5.3.3 Example
The number of people that show up to a bus stop is Poisson with a mean of 2.5 per hour. If watching the bus stop for 4 hos, what is the probability that 3 or fewer people show up for the whole time?
ppois(3, lambda = 2.5*4)