Implementing the Two-Stage Least Squares Approach
The two-stage least squares approach addresses omitted variables, simultaneity, and measurement error with the assumption that a strong and valid instrument removes the endogenous part of the predictor (Papies et al. 2017; Wooldridge 2010). It best suited for a continuous predictor. In a combined estimation, a first-stage regression computes the predicted values of the predictor, including at least one strong and valid instrument, and in a second-stage regression, the predictor is replaced by the predicted values from the first-stage, and the instrument is excluded from the model.
The ivregress command in Stata (https://www.stata.com/manuals/rivregress.pdf) can be used to implement two-stage least-squares regression with the following code.
//estimate two-stage least-squares regression including first stage estimates
ivregress 2sls Outcome Controls (Predictor = Instrument), first
The ivreg package in R (https://cran.r-project.org/web/packages/ivreg/index.html) can be used to implement two-stage least-squares regression with the following code. To obtain the first stage estimates, a separate regression model need to be run.
#load the ivreg package
library (ivreg)
#estimate two-stage least-squares regression
model_2SLS <- ivreg(Outcome ~ Predictor + Controls | Instrument + Controls, data = Dataset)
summary(model_2SLS)
#obtain the first-stage estimates
model_first_stage <- lm(Predictor ~ Instrument + Controls, data = Dataset)
summary(model_first_stage)
References
Papies, Dominik, Peter Ebbes, and Harald van Heerde (2017), “Addressing Endogeneity in Marketing Models,” Advanced Methods for Modeling Markets, Cham: Springer, 581–627.
Wooldridge, Jeffrey M. (2010). Econometric analysis of cross section and panel data. Cambridge: MIT Press.