Statistical Properties of Estimated Coefficients

Statistical Properties of Estimated Coefficients#

The goal of this section is to explore the statistical properties of a simple regression model. In particular, we will focus on a situation where the relationship between a dependent variable \(Y\) and an explanatory variable \(X\) is assumed to be given by:

\[ Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i \]

where \(\beta_0\) and \(\beta_1\) are the population parameters of interest, and \(\varepsilon_i\) is a random error term. We will consider \(n\) observations, indexed by \(i = 1, \ldots, n\), drawn from a sample of the target population.

We will also assume that, using the sample data, we estimate the model via the Ordinary Least Squares (OLS) method, obtaining estimates \(\hat{\beta}_0\) and \(\hat{\beta}_1\). The statistical question we are concerned with is:

What can we say about the precision of the estimators \(\hat{\beta}_0\) and \(\hat{\beta}_1\)? How close (or far) do we expect them to be from the true values \(\beta_0\) and \(\beta_1\)?

Fortunately, statistical theory provides us with two important results to understand this question: unbiasedness and the determinants of variance. In this section, we will first present these results formally, and then use an interactive simulation to demonstrate them numerically.


1. Unbiasedness

This property states that, on average, the estimated coefficients equal the true population coefficients. The intuition is that although individual estimates may vary and differ from the true value due to the randomness of sampling, if we could repeat this estimation over many samples, the average of those estimates would converge to the true value. This property ensures that our estimates are not systematically biased in any particular direction.

Formal result:

\[E(\hat{\beta}_0) = \beta_0, \qquad E(\hat{\beta}_1) = \beta_1\]

This means that, under the assumption \(E[\varepsilon_i \mid X] = 0\), the OLS estimator neither overestimates nor underestimates the true parameter on average: the bias is exactly zero. The formal proof of this result is in the appendix. The interactive simulation below will let us verify it numerically.


2. Variance of the Estimators

Even though they are unbiased, OLS estimators always exhibit some degree of variance, which quantifies the uncertainty around the estimated coefficient. Higher variance indicates less precise estimation; lower variance means the estimator is concentrated near the true value.

The variance of \(\hat{\beta}_1\) is governed by two factors: the unexplained variability of the model and the spread of the independent variable.

Formal result:

\[\text{Var}(\hat{\beta}_1 \mid X) = \frac{\sigma^2}{SST_x}, \qquad SST_x = \sum_{i=1}^{n}(x_i - \bar{x})^2\]

where \(\sigma^2 = \text{Var}(\varepsilon_i \mid X)\) is the error variance. The formula directly captures both intuitions: a larger model error (\(\sigma^2\)) increases the variance of the estimator, while greater variability in \(X\) (\(SST_x\)) reduces it, because a wider range of \(X\) values provides more information to pin down the slope. The formal derivation is in the appendix.


Interactive Simulation

To illustrate these results numerically, the simulation below allows us to explore the following question: if we could simulate multiple samples of data from a known population model, how close would the estimates (\(\hat{\beta}_0\) and \(\hat{\beta}_1\)) be to the true values (\(\beta_0\) and \(\beta_1\))? The simulation lets you experiment with different assumptions about the population model and visualize the results across many repeated samples.


Appendix: Formal Proofs

A.1 Unbiasedness of \(\hat{\beta}_1\)

Step 1 — Rewrite \(\hat{\beta}_1\) in terms of the population error.

We start from the OLS estimator:

\[\hat{\beta}_1 = \frac{\sum_{i=1}^n (x_i - \bar{x})\,Y_i}{SST_x}\]

Substituting \(Y_i = \beta_0 + \beta_1 x_i + \varepsilon_i\):

\[\hat{\beta}_1 = \frac{\sum_{i=1}^n (x_i - \bar{x})(\beta_0 + \beta_1 x_i + \varepsilon_i)}{SST_x}\]

Using \(\sum_{i=1}^n (x_i - \bar{x}) = 0\) and \(\sum_{i=1}^n (x_i - \bar{x})\,x_i = SST_x\), the term in \(\beta_0\) vanishes and the term in \(\beta_1\) simplifies:

\[\hat{\beta}_1 = \beta_1 + \frac{\sum_{i=1}^n (x_i - \bar{x})\,\varepsilon_i}{SST_x}\]

Step 2 — Take the expectation conditional on \(X\).

\[E[\hat{\beta}_1 \mid X] = \beta_1 + \frac{1}{SST_x}\sum_{i=1}^n (x_i - \bar{x})\underbrace{E[\varepsilon_i \mid X]}_{=\;0} = \beta_1\]

The last equality follows from the assumption \(E[\varepsilon_i \mid X] = 0\). Since the conditional expectation equals \(\beta_1\) for any realization of \(X\), we also conclude \(E[\hat{\beta}_1] = \beta_1\).

For \(\hat{\beta}_0\), the result follows analogously: since \(\hat{\beta}_0 = \bar{Y} - \hat{\beta}_1\bar{x}\), taking expectations and using \(E[\hat{\beta}_1] = \beta_1\) and \(E[\varepsilon_i \mid X] = 0\) gives \(E[\hat{\beta}_0] = \beta_0\).

A.2 Variance of \(\hat{\beta}_1\)

From Step 1 above we have:

\[\hat{\beta}_1 - \beta_1 = \frac{\sum_{i=1}^n (x_i - \bar{x})\,\varepsilon_i}{SST_x}\]

We take the variance conditional on \(X\). Under the homoskedasticity assumption, \(\text{Var}(\varepsilon_i \mid X) = \sigma^2\) for all \(i\), and the errors are mutually independent:

\[\text{Var}(\hat{\beta}_1 \mid X) = \frac{1}{SST_x^2}\,\sum_{i=1}^n (x_i - \bar{x})^2\,\sigma^2 = \frac{\sigma^2 \cdot SST_x}{SST_x^2} = \frac{\sigma^2}{SST_x}\]

The expression confirms that estimator precision improves (lower variance) when the model error is small or when the \(X\) data are more spread out.