17 Linear Least Squares

Ordinary least squares (OLS) is the workhorse of statistics. It gives a way of taking complicated outcomes and explaining behavior (such as trends) using linearity. The simplest application of OLS is fitting a line through some data. In the next few lectures, we cover the basics of linear least squares.

17.1 Notation for data

We write \(X_1, X_2, ..., X_n\) to describe \(n\) data points.
As an example, consider the data set {1,2,5}, then \(X_1 = 1, X_2 = 2, X_3 = 5\) and \(n=3\).
We often use a different letter than \(X\), such as \(Y_1, ...,Y_n\).
We wil typically use Greek letters for things that we don’t know, such as \(\mu\) which would be a mean that we would like to estimate.

17.1.1 The empirical mean

Define the empirical mean as \[\bar{X} = {1 \over n} \sum_{i=1}^n X_i\]
Notice if we subract the mean from data points, we get data that has mean 0. That is, we define \[\tilde{X_i} = X_i - \bar{X}\]
The mean of the \(\tilde{X_i}\) is 0. This process is called “centering” the random variables. Recall from the previous lecture that the mean is the least squares solution for minimising \[\sum_{i=1}^n (X_i - \mu)^2\]

17.2 The empirical standard deviation and variance

Define the empirical variance as \[S^2 = {1 \over n-1} \sum_{i=1}^n (X_i - \bar{X})^2 = {1 \over n-1} (\sum_{i=1}^n X_i^2 - n\bar{X}^2)\]
The epirical standard deviation is defined as \(S = \sqrt{S^2}\). Notice that the standard deviation has the same units as the data.
The data defined by \(X_i \over S\) have empirical standard deviation 1. This is called “scaling” the data.

17.3 Normalisation

The data defined by \[Z_i = {{X_i -\bar{X}}\over{S}}\] Have an empirical mean zero and empirical standard deviation 1.
The process of centering, then scaling the data is called “normalising” the data.
Normalised data are centred at 0 and have units equal to standard deviations of the original data.
Example, a value of 2 from normalised data means that the data was two standard deviations larger than the mean.

17.4 The Empirical Covariance

Consider now when we have pairs of data, \((X_i, Y_i)\).

Their empirical covariance is \[Cov(X,Y) = {1 \over n-1}\sum_{i=1}^n (X_i - \bar{X})(Y_i - \bar{Y}) = {1 \over n-1}(\sum_{i=1}^n X_iY_i - n\bar{X} \bar{Y})\]

The correlation is defined by \[Cor(X,Y) = {Cov(X,Y) \over {S_xS_y}}\] where \(S_x\) and \(S_y\) are the estimates of standard deviations for the \(X\) observations and \(Y\) observations, respectively.