Control Function

The control function approach addresses endogeneity from omitted variables, simultaneity, and measurement error. Like two-stage least squares, it relies on a strong and valid instrumental variable to isolate the exogenous part of the predictor. But instead of replacing the predictor with its predicted values, the control function approach adds the residual from the first stage as an extra control variable.

How It Works

Step 1: First-Stage Regression

Regress the endogenous predictor on the instrumental variable(s) and all exogenous controls and fixed effects from the outcome equation. Save the fitted residuals — these capture the endogenous component of the predictor.

Step 2: Second-Stage Regression

Estimate the outcome equation as usual, but include the first-stage residuals as an additional control variable. The instrument is excluded from this stage.

The residual absorbs the correlation between the predictor and the error term. If the residual is statistically significant, this signals that endogeneity is present and that the correction is doing meaningful work.

Standard Errors

Because the second-stage regression uses a generated variable (the first-stage residual), conventional standard errors are incorrect. Bootstrapping is required: repeatedly re-estimate both stages together and compute standard errors from the distribution of bootstrap estimates.

When to Use the Control Function over 2SLS

In linear models with continuous variables, the control function and two-stage least squares produce identical results. They diverge — and the control function becomes particularly useful — in several settings:

  • Binary, count, or multinomial outcomes (e.g., logit, Poisson, multinomial models)
  • Multiple endogenous predictors
  • Models with interaction terms or nonlinear effects (e.g., quadratic terms involving the endogenous predictor)

The trade-off: the control function is typically more efficient (smaller standard errors) when correctly specified, but two-stage least squares is more robust to misspecification of the outcome model.

Requirements

  • At least one strong instrumental variable (significantly predicts the endogenous predictor)
  • At least one valid instrumental variable (affects the outcome only through the predictor)
  • The first stage must include the same controls and fixed effects as the outcome equation
  • Bootstrapped standard errors to account for the generated regressor

Implementing the Control Function Approach

The cfregress command can be used to implement the control function approach with the following code in Stata (https://www.stata.com/manuals/rcfregress.pdf).

Copy to Clipboard

 

The functionality of the cfregress command is not directly available in R

References

  • Karaca‐Mandic, Pinar, and Kenneth Train (2003), “Standard Error Correction in Two‐Stage Estimation with Nested Samples,” The Econometrics Journal, 6(2), 401-407.
  • Papies, Dominik, Peter Ebbes, and Harald van Heerde (2017), “Addressing Endogeneity in Marketing Models,” Advanced Methods for Modeling Markets, Cham: Springer, 581–627.
  • Petrin, Amil and Kenneth Train (2003), “Omitted Product Attributes in Discrete Choice Models,” National Bureau of Economic Research Working Paper No.W9452, Cambridge, MA.
  • Petrin, Amil, and Kenneth Train (2010), “A Control Function Approach to Endogeneity in Consumer Choice Models,” Journal of Marketing Research, 47(1), 3-13.