Logarithmic Transformations#

Imagine you have annual salary data for 500 software engineers in Silicon Valley. Most earn between \(80,000 and \)150,000; a few founders and executives earn ten or a hundred times more. If you plot that distribution, the right tail stretches far out — the mean is inflated by the extreme values and the median sits much lower. Now take the logarithm of each salary: the distribution becomes nearly symmetric, the histogram looks like a bell curve.

This change of scale is not merely cosmetic. The same transformation that straightens the distribution also radically changes how the regression coefficient is interpreted. In this section we will see why logarithms are the most widely used tool in econometrics for capturing proportional and nonlinear relationships, and how to choose among the four possible specifications.


Why Take Logarithms?#

Three reasons come up repeatedly in practice:

  1. Skewed distributions. Variables such as wages, firm revenues, housing prices, or export quantities have long right tails. Their logarithm is much more symmetric and better approximates the normality assumption required for exact confidence intervals.

  2. Proportional effects. A \(1,000 raise for someone earning \)20,000 is enormous (5%); the same amount for someone earning $500,000 is negligible (0.2%). When agents respond to relative rather than absolute changes, the logarithm is the natural scale.

  3. Linearization of multiplicative relationships. If theory suggests \(Y = A \cdot X^{\beta_1}\), that relationship is not linear in the parameters. But \(\ln Y = \ln A + \beta_1 \ln X\) is, and we can estimate it by OLS.

Note

The natural logarithm is defined only for strictly positive values. Before transforming a variable, verify that all observations are \(> 0\). Variables that can take the value zero (overtime hours, number of children, exports in certain sectors and years) require special treatment.

Simulation: which specification fits best?#

Before seeing the formulas, explore the following dashboard. You will generate data from four different data-generating processes and see which regression specification — level-level, log-level, level-log, or log-log — produces the best fits and cleanest residuals.

What to look for:

  • When the true form is Log-Log, in which column does the scatter become more linear?

  • Does the residual pattern improve in the correct specification?

  • What happens with \(n = 50\) and high noise? And with \(n = 500\) and low noise?

What do we observe?#

The correct transformation straightens the scatter and eliminates the pattern in the residuals. In the wrong specification, the residuals vs. fitted values show a U-shaped curve (if we fit a line to a convex relationship) or an inverted-U (if the relationship is concave). \(R^2\) is a useful signal, but not sufficient: a wrong specification can have a reasonable \(R^2\) while the residuals display systematic curvature — which violates the OLS assumptions. A proper diagnostic always includes the residual plot.


The Log-Level Model: Logarithm in the Dependent Variable#

The log-level (or semi-logarithmic) model transforms only \(Y\):

\[\ln(Y_i) = \beta_0 + \beta_1 X_i + \varepsilon_i\]

Interpretation of the Coefficient \(\beta_1\)#

When \(X\) increases by one unit, the difference in logarithms is:

\[\ln(Y') - \ln(Y) = \ln\!\left(\frac{Y'}{Y}\right) = \beta_1\]

That is, \(Y'/Y = e^{\beta_1}\). The exact percentage change in \(Y\) associated with a one-unit increase in \(X\) is:

\[\%\Delta Y = 100 \cdot (e^{\beta_1} - 1)\]

For small values of \(|\beta_1|\), the Taylor expansion \(e^z \approx 1 + z\) gives the standard approximate rule:

A one-unit increase in \(X\) is associated with an approximate change of \(100 \cdot \beta_1\%\) in \(Y\).

For example, if a regression of \(\ln(\text{wage})\) on years of experience yields \(\hat\beta_1 = 0.08\), the reading is: one additional year of experience is associated with an approximately 8% higher wage.

When does the approximation break down?#

The \(100\beta_1\%\) rule is accurate when \(|\beta_1|\) is small, but deteriorates for larger coefficients — such as returns to education (10–20%) or effects of intensive treatments. The following simulation shows exactly where that gap occurs.

What to look for:

  • From what value of \(\beta_1\) does the difference between the approximation and the exact value exceed 3 percentage points?

  • Returns to education typically range between 8% and 15%. In which part of that range is it most important to use the exact formula?

Approximation vs. Exact Value

The \(100\beta_1\%\) rule is reliable when \(|\beta_1| < 0.10\) approximately (the error is less than 0.5 percentage points). For larger coefficients, report the exact effect: \(100(e^{\beta_1}-1)\%\). For example, \(\hat\beta_1 = 0.40\) implies an exact change of \(100(e^{0.40}-1) \approx 49\%\), not 40%.


The Level-Log Model: Logarithm in the Explanatory Variable#

The level-log model transforms only \(X\):

\[Y_i = \beta_0 + \beta_1 \ln(X_i) + \varepsilon_i\]

Interpretation of the Coefficient \(\beta_1\)#

Using the differential, a change \(\Delta X\) produces:

\[\Delta Y \approx \beta_1 \cdot \frac{\Delta X}{X}\]

If we express \(\Delta X / X\) as a percentage change divided by 100:

\[\Delta Y \approx \frac{\beta_1}{100} \times \%\Delta X\]

A 1% increase in \(X\) is associated with a change of \(\hat\beta_1 / 100\) units in \(Y\).

This model captures diminishing returns: each additional unit of \(X\) contributes less than the previous one. As we saw in the first simulation, this is the “diminishing returns” case: the scatter in level-level space shows a concave curve that flattens out, while in level-log space (where \(X\) is in logarithms) the relationship becomes linear.

A concrete example: in the relationship between advertising budget and a firm’s revenues, going from \(1,000 to \)10,000 in spending has a large effect on sales; going from \(5 million to \)6 million in advertising, much less. The first dollars capture the easiest-to-reach customers; the last ones chase progressively harder-to-convince audiences.


The Log-Log Model: Logarithm in Both Variables#

The log-log (or double-logarithmic) model transforms both \(Y\) and \(X\):

\[\ln(Y_i) = \beta_0 + \beta_1 \ln(X_i) + \varepsilon_i\]

The coefficient \(\beta_1\) has a particularly clean interpretation: it is the elasticity of \(Y\) with respect to \(X\).

Dashboard: price-demand elasticity#

Before the formal derivation, explore the following dashboard. You will see how the slope in log-log space varies with elasticity, and how that translates into the impact on total revenues for an e-commerce platform.

What to look for:

  • What happens to revenues when the elasticity is greater than 1 in absolute value and the price rises?

  • At what value of \(\beta_1\) are revenues independent of price?

  • How does the slope of the line in log-log space change as you vary the elasticity?

What do we observe?#

The slope of the line in log-log space is the elasticity: a dimensionless measure that does not depend on the units of \(P\) or \(Q\). The boundary at \(|\beta_1| = 1\) separates two regimes with opposite implications for revenues:

  • Inelastic demand (\(|\beta_1| < 1\)): if price rises 10%, quantity falls by less than 10%. Revenues increase.

  • Unit elasticity (\(|\beta_1| = 1\)): price and quantity move in equal proportion. Revenues do not change.

  • Elastic demand (\(|\beta_1| > 1\)): if price rises 10%, quantity falls by more than 10%. Revenues decrease.

Companies like Amazon routinely estimate these elasticities by product for their dynamic pricing algorithms.

Formal derivation#

Using the differential on \(\ln Y = \beta_0 + \beta_1 \ln X + \varepsilon\):

\[d\ln(Y) = \beta_1 \, d\ln(X) \quad \Longrightarrow \quad \frac{dY}{Y} = \beta_1 \cdot \frac{dX}{X}\]

In terms of percentage changes:

\[\%\Delta Y \approx \beta_1 \times \%\Delta X\]

A 1% increase in \(X\) is associated with a change of \(\beta_1\%\) in \(Y\).


Comparative Summary#

Model

Equation

Interpretation of \(\hat\beta_1\)

Level-level

\(Y = \beta_0 + \beta_1 X + \varepsilon\)

\(\Delta X = 1 \Rightarrow \Delta Y = \hat\beta_1\) (units)

Log-level

\(\ln(Y) = \beta_0 + \beta_1 X + \varepsilon\)

\(\Delta X = 1 \Rightarrow \%\Delta Y \approx 100\hat\beta_1\)

Level-log

\(Y = \beta_0 + \beta_1 \ln(X) + \varepsilon\)

\(\%\Delta X = 1 \Rightarrow \Delta Y \approx \hat\beta_1/100\)

Log-log

\(\ln(Y) = \beta_0 + \beta_1 \ln(X) + \varepsilon\)

\(\%\Delta X = 1 \Rightarrow \%\Delta Y \approx \hat\beta_1\) (elasticity)

In the first simulation of this section you saw these four models in action: each DGP has a correct specification (the green column), and only that one produces residuals without a pattern.


How to Choose the Specification?#

The choice should not be based solely on which produces the highest \(R^2\). Three criteria guide the decision:

  1. Economic theory. Theory usually suggests the most natural scale. If the phenomenon of interest operates in proportional terms (wages, demand elasticities, returns to scale), the logarithm is the appropriate scale.

  2. Graphical inspection of the scatter. As shown in the first simulation, the scatter reveals whether the relationship is linear, concave, or convex in each variable space. A relationship that looks curved in level-level space may become linear when logarithms are taken.

  3. Residual diagnostics. Comparing the residuals vs. fitted values pattern across specifications is more informative than comparing \(R^2\) alone. The correct specification produces residuals without systematic curvature.