Heckman Treatment Correction
The Heckman treatment correction addresses endogeneity arising from treatment selection — situations where a binary predictor (treatment vs. no treatment) is not randomly assigned but driven by unobserved factors that also affect the outcome.
When to Use It
Use the Heckman treatment correction when:
- The focal predictor is binary (e.g., exposed vs. not exposed, adopted vs. not adopted)
- Assignment to treatment is non-random and likely driven by unobserved factors
- Those same unobserved factors plausibly affect the outcome
This makes it the IV approach of choice for treatment selection — the specific form of endogeneity where unobserved characteristics determine who receives the treatment.
How It Works
Step 1: Model the Treatment Decision
Estimate a probit regression that predicts the probability of receiving the treatment (scoring “1” on the binary predictor). This first-stage model should include:
- All exogenous controls and fixed effects from the outcome equation
- At least one strong and valid instrument — a variable that predicts treatment assignment but has no direct effect on the outcome
From this probit model, compute the Inverse Mills Ratio (IMR) — a correction term that captures the bias introduced by non-random treatment assignment.
Step 2: Estimate the Outcome Equation
Include the Inverse Mills Ratio from Step 1 as an additional control variable in the outcome regression. The instrument is excluded from this stage. The IMR absorbs the correlation between the treatment decision and the error term, correcting for selection bias.
If the IMR is statistically significant, this indicates that treatment selection is indeed endogenous and that the correction is doing meaningful work.
Standard errors must be corrected via bootstrapping, since the second stage uses a generated regressor from the first stage.
Requirements
- A binary endogenous predictor (treatment indicator)
- At least one strong instrumental variable (significantly predicts treatment assignment)
- At least one valid instrumental variable (affects the outcome only through the treatment)
- The first stage must include the same controls and fixed effects as the outcome equation
- Bootstrapped standard errors to account for the generated Inverse Mills Ratio
Implementing the Heckman Treatment Estimate Approach
The etregress command can be used to implement the Heckman Treatment Estimate with the following code in Stata (https://www.stata.com/manuals/causaletregress.pdf).
The sampleSelection package can be used to implement the Heckman Treatment Estimate with the following code in R (https://cran.r-project.org/web/packages/sampleSelection/index.html).
References
- Heckman, James J. (1979), “Sample Selection Bias as a Specification Error,” Econometrica: Journal of the Econometric Society, 47, 153-161.