Heteroscedasticity: Consequences, Diagnosis, and Robust Inference#
In our discussion of the statistical properties of OLS estimators, we showed that the variance of \(\hat{\beta}_1\) depends on the variance of the error term: \(\text{Var}(\hat{\beta}_1 \mid X) = \sigma^2 / SST_x\). This result, and the standard errors that flow from it, proved essential for hypothesis testing and confidence intervals.
When we derived that formula, however, we relied on an assumption that is convenient but may be unrealistic in practice: that the error variance is the same for every observation. Econometricians call this homoscedasticity.
In many empirical applications, this assumption is hard to maintain. The variance of the error term often depends on the level of the explanatory variables, the size of the units under study, or the group to which each observation belongs. Some examples:
Expenditure and income: Higher-income households spend more on average, but also show greater variability in their consumption choices. A household earning $5,000 per month might spend anywhere between $3,000 and $7,000; one earning $50,000 has a much wider plausible range.
Wages and firm size: Small firms exhibit less wage dispersion than large multinationals, where pay varies enormously across roles and seniority levels.
Cross-country data: Growth rates of large, diversified economies tend to be more stable than those of small, commodity-dependent ones.
In all these cases, the error variance is not constant — it varies systematically with the regressors. How does this more realistic picture affect our estimates? Does it matter for inference? And if so, what can we do about it?
In this section we explore heteroscedasticity — the case where the variance of the error differs across observations. We will see that OLS point estimates remain unbiased, but efficiency and inference are affected, and we will learn how to detect the problem and restore valid inference.
1. Objectives#
By the end of this section, you will understand:
What heteroscedasticity is and why it arises naturally in economic data.
What happens to OLS estimators when the error variance is not constant: what is preserved and what breaks.
How to detect heteroscedasticity through visual inspection and formal tests.
How to fix the problem using heteroscedasticity-consistent (HC) standard errors and Weighted Least Squares (WLS).
2. What Is Heteroscedasticity?#
The standard linear regression model assumes that the error variance is the same for all observations:
This assumption is called homoscedasticity. Heteroscedasticity is its violation:
where \(\sigma_i^2\) varies across observations. In practice, the most common patterns are:
Variance proportional to \(X_i\): higher-income households have more variable spending.
Variance proportional to \(X_i^2\): dispersion grows quadratically with the regressor.
Variance differing by group: men and women, developed and developing countries, large and small firms.
While heteroscedasticity can take many forms, one common example is a funnel-shaped residual plot: as fitted values (or \(X\) values) increase, the spread of the residuals also increases. Under homoscedasticity, that vertical band would be constant. The following dashboard lets you explore this pattern visually before diving into its formal consequences:
Toggle between the homoscedastic and heteroscedastic error structures and observe how the residuals vs. fitted-values plot changes shape.
Under heteroscedasticity, notice how the funnel opens to the right: the spread of residuals grows with \(X\).
Under homoscedasticity, the vertical scatter remains roughly constant across all fitted values.
3. Exploring the Consequences via Simulation#
The most direct way to understand what heteroscedasticity does is to take the god’s-eye view. The simulation below generates data under two error structures — homoscedastic and heteroscedastic — and estimates the model across 500 repeated samples. Before reading the formal results, explore the simulation with four questions in mind:
Toggle between the two error structures. What changes in the scatter plot and the residual plots?
Look at the histogram of the sampling distribution of \(\hat{\beta}_1\). Does the center shift? Does the spread change?
Look at the histogram of the sampling distribution of \(\widehat{\text{Var}(\hat{\beta}_1)}\). Does its center coincide with the variance of estimates actually observed across simulations?
Look at the confidence intervals arising from the simulations — how often do they contain the true \(\beta_1\) value?
What do we observe?#
The point estimate is unaffected. The histogram of \(\hat{\beta}_1\) remains centered on the true value \(\beta_1 = 3\) under both error structures. Misestimating the variance does not introduce bias in the slope estimator.
The distribution widens. Under heteroscedasticity, the sampling distribution of \(\hat{\beta}_1\) is more dispersed. OLS remains unbiased, but loses precision: each individual estimate can be farther from the true value.
The estimated variance is biased. The center of the \(\widehat{\text{Var}(\hat{\beta}_1)}\) histogram does not coincide with the actual variance of the sampling distribution. The conventional formula over- or underestimates the true uncertainty depending on the specific pattern of heteroscedasticity.
Coverage falls below 95%. As a direct consequence, confidence intervals built with the standard formula no longer contain the true value 95% of the time. The conventional standard error is wrong, which invalidates hypothesis tests, confidence intervals, and p-values.
4. Formal Results: Why OLS Breaks (But Only Partially)#
4.1 Unbiasedness is preserved#
The proof of OLS unbiasedness requires only \(E[\varepsilon_i \mid X] = 0\), not constant variance. Therefore:
regardless of whether errors are homoscedastic or heteroscedastic. This is why the simulation histogram did not shift. The formal derivation, which makes explicit that the homoscedasticity assumption plays no role, is in the appendix.
4.2 Standard errors are wrong#
The conventional formula \(\widehat{\text{Var}}(\hat{\beta}_1) = \hat{\sigma}^2/SST_x\) assumes that all observations share the same variance \(\sigma^2\). When \(\sigma_i^2\) varies across observations, the correct variance of \(\hat{\beta}_1\) in the simple regression case is:
The derivation is in the appendix. This expression does not simplify to \(\sigma^2/SST_X\) because the numerator weights each squared deviation \((x_i - \bar{x})^2\) by the observation-specific variance \(\sigma_i^2\). The result depends on how the variance structure aligns with the distribution of \(X\) — something the conventional formula ignores entirely.
For the multiple regression case, the correct variance takes the matrix form known as the sandwich estimator:
The “filling” in the sandwich — the sum weighted by \(\sigma_i^2\) — does not simplify to \(\sigma^2 (\mathbf{X}'\mathbf{X})\) unless variance is constant. The direction of bias in the conventional formula depends on the specific heteroscedasticity pattern and cannot be predicted without additional information.
The practical consequence is direct: \(t\)-statistics, \(F\)-statistics, p-values, and confidence intervals calculated with the standard formula are all incorrect under heteroscedasticity.
4.3 Efficiency is lost#
Under homoscedasticity, the Gauss-Markov theorem guarantees that OLS is the Best Linear Unbiased Estimator (BLUE): it has the lowest variance among all linear unbiased estimators. That result depends critically on the constant variance assumption.
Under heteroscedasticity, OLS is still linear and unbiased, but no longer the most efficient. An estimator exists — Weighted Least Squares — with lower variance. We return to this in section 6.
5. Diagnosing Heteroscedasticity#
5.1 Residual plots#
The plot of residuals against fitted values (or against \(X\)) is the primary visual diagnostic tool. A funnel pattern — dispersion increasing with fitted values — signals heteroscedasticity. Plots of \(|\hat{\varepsilon}_i|\) or \(\hat{\varepsilon}_i^2\) against fitted values can also be informative.
This is a screening tool: the eye can be fooled, especially in small samples. An apparent pattern may be random noise; moderate heteroscedasticity may be invisible. Formal tests complement visual inspection.
5.2 Formal tests#
Both tests below share the same null hypothesis: homoscedasticity (\(\sigma_i^2 = \sigma^2\) for all \(i\)).
Test |
Auxiliary regression |
Statistic |
Distribution under \(H_0\) |
Best for |
|---|---|---|---|---|
Breusch-Pagan |
\(\hat{\varepsilon}_i^2\) on regressors |
\(n \cdot R^2_{\text{aux}}\) |
\(\chi^2(k)\) |
Linear patterns of heteroscedasticity |
White |
\(\hat{\varepsilon}_i^2\) on regressors, their squares, and cross-products |
\(n \cdot R^2_{\text{aux}}\) |
\(\chi^2(p)\) with larger \(p\) |
General patterns (linear and nonlinear) |
Common procedure for both:
Estimate OLS and obtain residuals \(\hat{\varepsilon}_i\).
Compute \(\hat{\varepsilon}_i^2\) for each observation.
Regress \(\hat{\varepsilon}_i^2\) on the variables indicated in the table.
Compute \(n \cdot R^2\) from that auxiliary regression and compare it to the \(\chi^2\) distribution.
Rejecting the null confirms that heteroscedasticity is present; it does not reveal its form. That asymmetry matters for choosing the remedy, as we see next.
The following dashboard runs both tests on a single sample and displays the residual pattern alongside the test statistics and p-values. Use it to see how the tests perform under both error structures:
Set the error structure to Heteroscedastic and observe the Breusch-Pagan and White p-values — do both tests reject at the 5% level?
Switch to Homoscedastic errors — are the results now consistent with the null hypothesis of homoscedasticity?
Notice how the \(\hat{\varepsilon}_i^2\) vs \(X\) plot makes the underlying pattern visible: the upward trend in squared residuals is exactly what both tests are designed to detect.
6. The Fix: Robust Standard Errors and WLS#
Two main tools have been proposed to address heteroscedasticity: heteroscedasticity-consistent (HC) standard errors and Weighted Least Squares (WLS). We introduce each estimator first, then explore their properties using the simulation.
6.1 Heteroscedasticity-Consistent (HC) Standard Errors#
The core idea, due to White (1980), is to estimate the sandwich variance directly from residuals rather than imposing constant variance. Since \(\sigma_i^2\) is unknown, it is replaced by the squared OLS residual \(\hat{\varepsilon}_i^2\). The resulting HC0 estimator is:
This estimator is consistent for the true sandwich variance under general heteroscedasticity. It does not change the point estimates: \(\hat{\boldsymbol{\beta}}\) is exactly the same OLS estimator; only the standard errors are recalculated.
Several finite-sample corrections improve on HC0. The most common in applied economics is HC1, which multiplies by \(n/(n-k)\) to apply a degrees-of-freedom correction analogous to the one used in the ordinary variance estimator:
HC2 goes further by dividing each squared residual by \((1 - h_{ii})\), where \(h_{ii}\) is the leverage of observation \(i\). High-leverage observations have large \(h_{ii}\) and tend to have systematically smaller residuals, so HC2 corrects upward for this downward bias in \(\hat{\varepsilon}_i^2\). HC3 uses \((1 - h_{ii})^2\) in the denominator, applying an even more aggressive correction that is generally more conservative but tends to behave better in small samples.
In applied economics, HC1 is the default. HC3 is preferred when samples are small or leverage is uneven across observations.
6.2 Weighted Least Squares (WLS)#
If we know (or can estimate) the variance structure — \(\text{Var}(\varepsilon_i \mid X_i) = \sigma^2 h(X_i)\) for some known function \(h\) — we can do more than just correct standard errors: we can recover BLUE efficiency.
The idea is to transform the model by dividing each observation by \(\sqrt{h(X_i)}\):
The transformed error \(\varepsilon_i / \sqrt{h(X_i)}\) has constant variance \(\sigma^2\). Applying OLS to the transformed model is equivalent to minimizing:
assigning more weight to observations with lower variance. Gauss-Markov applies to the transformed model: WLS is BLUE under the specified variance structure.
Practical risk: if \(h\) is misspecified, the weights may be worse than no weighting at all. In practice, feasible GLS is used: \(h\) is estimated from the residuals (for example, by regressing \(\log \hat{\varepsilon}_i^2\) on \(X_i\)) and the fitted values are used as weights. Results should always be reported alongside OLS+HC as a robustness check.
Exploring the properties via simulation#
The following simulation lets you compare all four variance estimators (Non-Robust, HC1, HC2, HC3) across different sample sizes. Keep these questions in mind while exploring:
Look at the \(t\)-statistic histogram — which estimator produces a distribution that closely matches the theoretical \(t\)-distribution? Which one is too wide or too narrow?
Compare CI coverage across estimators — which achieves coverage closest to 95% under heteroscedasticity?
Try different sample sizes — do the differences between HC1, HC2, and HC3 matter more in small samples or in large ones?
What do we observe?#
HC standard errors restore correct coverage. Under heteroscedasticity, OLS with HC1 errors recovers 95% coverage without changing point estimates. The cost is virtually zero.
HC2 and HC3 improve coverage in small samples. With smaller \(n\), HC1 can still under-cover. HC2 and HC3 apply larger corrections that bring coverage closer to the nominal level when leverage varies.
WLS with correct weights is more efficient. When weights reflect the true variance structure, WLS produces narrower intervals than OLS+HC, in addition to correct coverage.
Under homoscedasticity, all methods converge. If errors are actually homoscedastic, all estimators produce virtually identical estimates and inference. There is no penalty for using robust errors when they are not needed.
7. Summary and Decision Guide#
Heteroscedasticity does not invalidate OLS regression; it complicates it in one specific respect. The following decision guide summarizes recommended practice:
Always inspect residual plots as a first screen.
Run Breusch-Pagan and White tests for statistical confirmation.
If heteroscedasticity is detected (or if there is reasonable doubt): use HC1 robust standard errors by default. This is the minimum standard in applied economics and has virtually no cost.
If the variance structure is estimable: consider WLS for efficiency gains, but always report OLS+HC results as a robustness check.
OLS point estimates are never discarded due to heteroscedasticity: the estimator remains unbiased. Only the quantification of uncertainty changes.
In contemporary applied econometrics, reporting robust standard errors is the norm, not the exception. The question is no longer “should I use robust errors?” but “when is it also worth using WLS?”
Appendix: Formal Derivations#
A.1 Unbiasedness Does Not Require Homoscedasticity#
We want to show that \(E[\hat{\beta}_1 \mid X] = \beta_1\) without invoking constant variance.
Step 1 — Write \(\hat{\beta}_1\) in terms of the true \(\beta_1\) and the errors. Starting from \(\hat{\beta}_1 = \frac{\sum_i (x_i - \bar{x}) y_i}{SST_X}\) and substituting \(y_i = \beta_0 + \beta_1 x_i + \varepsilon_i\):
Step 2 — Take the conditional expectation:
Step 3 — Apply the zero-conditional-mean assumption \(E[\varepsilon_i \mid X] = 0\) for all \(i\). The second term vanishes and we obtain \(E[\hat{\beta}_1 \mid X] = \beta_1\).
Notice that no step required \(\text{Var}(\varepsilon_i \mid X) = \sigma^2\). Unbiasedness holds for any variance structure as long as \(E[\varepsilon_i \mid X] = 0\).
A.2 Variance of \(\hat{\beta}_1\) Under Heteroscedasticity#
We want to derive \(\text{Var}(\hat{\beta}_1 \mid X)\) without assuming constant variance.
Step 1 — Use the expression from A.1:
Step 2 — Compute the conditional variance. Since \(X\) is treated as fixed and the \(\varepsilon_i\) are independent across observations:
Step 3 — Substitute \(\text{Var}(\varepsilon_i \mid X) = \sigma_i^2\):
Under homoscedasticity (\(\sigma_i^2 = \sigma^2\) for all \(i\)), this reduces to \(\sigma^2 SST_X / SST_X^2 = \sigma^2 / SST_X\), confirming the familiar formula. Under heteroscedasticity, the numerator is a weighted sum of \(\sigma_i^2\) with weights \((x_i - \bar{x})^2\), and the expression does not simplify further.