Heckman Selection Correction
The Heckman selection correction addresses endogeneity arising from sample selection — situations where observations enter the sample non-randomly, and the factors driving inclusion are related to the outcome.
When to Use It
Use the Heckman selection correction when:
- The outcome is observed only for a subset of the relevant population (e.g., only for customers who made at least one purchase, only for firms that publicly disclose data)
- The process determining who enters the sample is non-random and likely driven by unobserved factors
- Those same unobserved factors plausibly affect the outcome
If left unaddressed, this non-random inclusion biases the estimates because the observed sample is systematically different from the population the researcher wants to generalize to.
A Key Requirement: Data Beyond the Sample
Unlike other IV approaches, the Heckman selection correction requires additional observations outside the estimation sample — specifically, observations where the selection variable is observed but the outcome is not. These extra observations are essential for modeling the selection process. Without them, the first-stage selection equation cannot be estimated.
For example, if a study examines the effect of marketing spend on sales but only observes sales for firms that publicly report financials, the researcher also needs data on firms that do not report — at least enough to model what predicts public disclosure.
How It Works
Step 1: Model the Selection Process
Estimate a probit regression that predicts the probability of an observation entering the sample. This model uses both the in-sample and out-of-sample observations and should include:
- All exogenous controls and fixed effects from the outcome equation
- At least one strong and valid instrument — a variable that predicts sample inclusion but has no direct effect on the outcome
From this probit model, compute the Inverse Mills Ratio (IMR) — a correction term that captures the bias introduced by non-random sample selection.
Step 2: Estimate the Outcome Equation
Include the Inverse Mills Ratio from Step 1 as an additional control variable in the outcome regression, estimated on the in-sample observations only. The instrument is excluded from this stage. The IMR absorbs the correlation between the selection process and the error term, correcting for selection bias.
If the IMR is statistically significant, this indicates that sample selection is indeed endogenous and that the correction is doing meaningful work.
Standard errors must be corrected via bootstrapping, since the second stage uses a generated regressor from the first stage.
Requirements
- A clear selection mechanism determining which observations are in the sample
- Additional observations outside the estimation sample to model the selection process
- At least one strong instrumental variable (significantly predicts sample inclusion)
- At least one valid instrumental variable (affects the outcome only through the selection process)
- The first stage must include the same controls and fixed effects as the outcome equation
- Bootstrapped standard errors to account for the generated Inverse Mills Ratio
The two Heckman corrections share the same mechanics (probit first stage, Inverse Mills Ratio, bootstrapped standard errors) but address different problems. The selection correction is about generalizability (can we extrapolate beyond this sample?), while the treatment correction is about internal validity (is the treatment effect confounded by non-random assignment?).
Implementing the Heckman Selection Correction
The heckman command can be used to implement the Heckman Treatment Estimate with the following code in Stata (https://www.stata.com/manuals/rheckman.pdf).
The sampleSelection package can be used to implement the Heckman Selection Correction with the following code in R (https://cran.r-project.org/web/packages/sampleSelection/index.html).
References
- Heckman, James J. (1979), “Sample Selection Bias as a Specification Error,” Econometrica: Journal of the Econometric Society, 47, 153-161.
- Wolfolds, Sarah E., and Jordan Siegel (2019), “Misaccounting for Endogeneity: The Peril of Relying on the Heckman Two‐Step Method without a Valid Instrument,” Strategic Management Journal, 40(3), 432-462.