Instrumental Variable Approaches
Instrumental variable (IV) approaches address endogeneity by decomposing the observed variation in a predictor into an exogenous component and an endogenous component, using an external instrument to isolate the exogenous part.
The Common Logic
All IV approaches share the same core idea: use a variable (the instrument) that shifts the predictor but has no direct effect on the outcome. The instrument creates variation in the predictor that is plausibly unrelated to unobserved determinants of the outcome, allowing the researcher to estimate the causal effect from this “clean” variation alone.
Depending on the source of endogeneity, researchers choose among four IV estimators:
- Two-stage least squares (2SLS): Omitted variables, simultaneity, and measurement error with a continuous predictor
- Control function: Same as 2SLS, plus binary, count, or multinomial outcomes and models with interactions or nonlinearities
- Heckman treatment correction: Treatment selection (binary predictor)
- Heckman selection correction: Sample selection (non-random inclusion in the sample)
How It Works
The general procedure follows two stages:
Stage 1: Regress the endogenous predictor on the instrument(s) and all exogenous controls and fixed effects from the outcome equation. This stage isolates the variation in the predictor that is driven by the instrument — the exogenous component — and separates it from the endogenous component that is correlated with the error term.
Stage 2: Use only the exogenous component from Stage 1 in the outcome equation, excluding the instrument. Because this variation is (by assumption) unrelated to unobserved determinants of the outcome, the second-stage coefficient provides a consistent estimate of the causal effect.
The four IV estimators differ in how they implement this two-stage logic:
- Two-stage least squares (2SLS): Replaces the predictor with its predicted values from Stage 1
- Control function: Adds the Stage 1 residuals as a control variable in the outcome equation
- Heckman treatment correction: Uses a first-stage probit to compute an Inverse Mills Ratio, which is added as a control in the outcome equation
- Heckman selection correction: Same as treatment correction, but models selection into the sample rather than into treatment
In linear models with continuous variables, 2SLS and the control function produce identical results. They diverge when interactions or nonlinearities are present — 2SLS is more robust to misspecification, while the control function is more efficient when correctly specified.
Untestable Assumptions
Every IV approach rests on assumptions that cannot be verified empirically:
- Exclusion restriction. The instrument affects the outcome only through the endogenous predictor. There is no direct path from the instrument to the outcome, and no shared unobserved cause linking them. This is the central assumption of any IV approach — and it can only be defended with theoretical arguments, never with a statistical test.
- Instrument strength. The instrument must meaningfully predict the endogenous predictor. Weak instruments produce unstable, biased estimates and inflated standard errors — potentially making the correction worse than no correction at all. Strength can be assessed empirically (via the first-stage F-statistic), but passing a threshold does not guarantee that the instrument is strong enough in a given context.
- Sufficient granularity. The instrument must vary at the same level of aggregation as the endogenous predictor. An instrument that varies only across broad categories may lack the resolution to isolate within-unit endogenous variation, undermining the correction even if it is technically strong and valid.
- Correct functional form. For Heckman-style corrections, the first-stage probit model must be correctly specified. For the control function approach, the distributional assumptions about the error structure must hold. Misspecification at either stage propagates into biased second-stage estimates.
The core trade-off is clear: IV approaches replace the assumption that the predictor is exogenous (which is known to be violated) with the exclusion restriction (which is untestable). Whether this trade improves inference depends entirely on the credibility of the instrument.
Three Requirements for Any Instrument
Every IV approach requires at least one instrument that satisfies three conditions:
- Strong — the instrument significantly predicts the endogenous predictor
- Valid — the instrument affects the outcome only through the predictor (the exclusion restriction, which is inherently untestable)
- Granular — the instrument varies at the same level of aggregation as the endogenous predictor
Weak or invalid instruments can produce estimates that are more biased than the original — making the cure worse than the disease.