Motivation: Applications of Regression in Research

Motivation: Applications of Regression in Research#

Regression analysis is a powerful statistical tool used in various research fields to understand the relationship between variables and make predictions. Regression analysis is also the fundamental tool of Econometrics, the discipline in economics that studies specific statistical tools to examine economic phenomena.

Regression analysis enables a series of interesting applications in quantitative research. In this introductory section, we will review some of these, including data extrapolation, causal relationship inference, and the construction of predictive models.

Uses of Regression#

1. Data Extrapolation and Prediction:

A first application of regression is data extrapolation. Let me begin by illustrating this use with a case study that I find particularly compelling: how Xbox gaming data was used to predict the outcome of the 2012 U.S. presidential election (Wang, et al. 2015).

As you may know, traditional polling methods, which typically rely on telephone surveys, have become increasingly less effective due to declining response rates and biased samples. Polls can easily be skewed toward specific demographics. Recognizing this challenge, researchers proposed a novel approach: leveraging the vast Xbox user base to launch a survey and create a dataset that reflected the demographic characteristics of the actual voting population. At first glance, this might sound strange: Xbox users are expected to be biased relative to the general population, specifically toward younger people!

However, in this case, researchers employed an ingenious strategy to address the bias. Taking advantage of the thousands of responses they could obtain from Xbox users, they constructed a grid of different demographic profiles (e.g., age groups, gender, etc.) to estimate the voting tendencies of each group based on their characteristics. When data for specific combinations of characteristics were unavailable or too sparse, the researchers used interpolation, drawing on the results of similar demographic groups. Regression played a crucial role in this interpolation process, allowing researchers to make predictions even for groups with limited data. Once the grid of voting tendencies for each group was complete, the researchers weighted each group according to its share in the voting population (a process commonly referred to as post-stratification). As a result, they were able to obtain predictions quite close to the actual election outcomes.

2. Inference of Causal Relationships:

Beyond prediction, regression plays a crucial role in inferring causal relationships between variables. Consider the study by Lagomarsino and Rossi (2024) on the impact of a housing subsidy program on domestic violence against women. This study was designed to evaluate whether the program, which used a lottery system to allocate housing, had unintended consequences in terms of violence against women.

One might think that evaluating the impact of such a program simply requires comparing households that received subsidies with those that did not. However, as we will examine in detail when we study causal inference, one of the main challenges in such comparisons is the so-called “apples and oranges” problem: recipients and the comparison group were likely different from the outset.

In this case, the researchers compared the experiences of those who received housing through the lottery with those who did not, providing a quasi-experimental setting in which to evaluate the program’s effect. Using regression models, they estimated the causal effect of the program on domestic violence, controlling for other factors that might influence the outcome. This demonstrated the power of regression for evaluating the causal impact of interventions, particularly when controlled experiments are difficult or ethically infeasible.

We will see that regression is particularly valuable for analyzing observational data, where random assignment to treatment and control groups is impossible. By carefully controlling for potential confounding variables through regression, researchers can estimate the causal effect of a particular intervention or factor on an outcome variable.

3. Construction of Predictive Models:

As a third use case, regression is widely employed to build predictive models, allowing researchers to estimate the value of a “dependent” variable based on the values of “independent” variables. Consider, for example, the case of land value assessments that a municipality must conduct for tax purposes. Or perhaps the pricing recommendations that a platform like Airbnb offers to property owners when they wish to set rental rates. In these cases, variants of the “hedonic pricing model” are used, wherein the price of a property is assumed to decompose into its observable characteristics, such as size, location, amenities, and zoning regulations. The assumption is that each of these characteristics contributes to the total value of the property. By applying regression, researchers can estimate the contribution of each characteristic to the total property value. This information can then be used to predict the market price of properties based on their specific characteristics.

As we will see, while regression models typically assume linear relationships between variables, regression can also be used to model nonlinear relationships. By applying transformations to variables, researchers can capture more complex relationships between characteristics and property value, improving the model’s accuracy.

These examples demonstrate the versatility of regression analysis. From predicting election outcomes to inferring causal relationships and building predictive models for real estate valuations, regression provides a robust framework for understanding complex phenomena and extracting valuable insights from data.

Extract from Urban Land Valuation

Extract from Airbnb Pricing Recommendation Tool


Wang, W.; Rothschild, D.; Goel, S.; Gelman, A. Forecasting Elections with Non-Representative Polls. International Journal of Forecasting 2015, 31 (3), 980–991. https://doi.org/10.1016/j.ijforecast.2014.06.001.

Lagomarsino, B. C.; Rossi, M. A. JUE Insight: The Unintended Effect of Argentina’s Subsidized Homeownership Lottery Program on Intimate Partner Violence. Journal of Urban Economics 2024, 142, 103612. https://doi.org/10.1016/j.jue.2023.103612.