{"id":178,"date":"2024-08-05T08:55:46","date_gmt":"2024-08-05T08:55:46","guid":{"rendered":"https:\/\/www.endogeneity.net\/?page_id=178"},"modified":"2026-04-04T14:38:43","modified_gmt":"2026-04-04T14:38:43","slug":"heckman-selection-correction-approach","status":"publish","type":"page","link":"https:\/\/www.endogeneity.net\/?page_id=178","title":{"rendered":"Heckman Selection Correction Approach"},"content":{"rendered":"<div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-1 fusion-flex-container has-pattern-background has-mask-background nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--link_hover_color: var(--awb-color5);--link_color: var(--awb-color5);--awb-background-blend-mode:multiply;--awb-border-color:var(--awb-color1);--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-padding-top:50.156000000000006px;--awb-padding-bottom:0px;--awb-padding-top-small:70px;--awb-padding-right-small:40px;--awb-padding-bottom-small:0px;--awb-padding-left-small:40px;--awb-margin-bottom-medium:0px;--awb-margin-bottom-small:60px;--awb-background-color:#ffffff;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-center fusion-flex-content-wrap\" style=\"max-width:1248px;margin-left: calc(-4% \/ 2 );margin-right: calc(-4% \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-0 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-padding-bottom-medium:0px;--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:85px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-margin-bottom-small:44px;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-1 fusion-text-no-margin\" style=\"--awb-content-alignment:center;--awb-text-color:var(--awb-color1);--awb-margin-right:15%;--awb-margin-bottom:0px;--awb-margin-left:15%;\"><p style=\"text-align: left;\"><span style=\"color: #000000;\"><strong>Heckman Selection Correction<\/strong><\/span><\/p>\n<p style=\"text-align: left; color: #000000;\">The Heckman selection correction addresses endogeneity arising from sample selection \u2014 situations where observations enter the sample non-randomly, and the factors driving inclusion are related to the outcome.<\/p>\n<p style=\"text-align: left;\"><strong style=\"color: #000000;\">When to Use It<\/strong><\/p>\n<p style=\"text-align: left; color: #000000;\">Use the Heckman selection correction when:<\/p>\n<ul style=\"text-align: left;\">\n<li><span style=\"color: #000000;\">The outcome is observed only for a subset of the relevant population (e.g., only for customers who made at least one purchase, only for firms that publicly disclose data)<\/span><\/li>\n<li><span style=\"color: #000000;\">The process determining who enters the sample is non-random and likely driven by unobserved factors<\/span><\/li>\n<li><span style=\"color: #000000;\">Those same unobserved factors plausibly affect the outcome<\/span><\/li>\n<\/ul>\n<p style=\"text-align: left; color: #000000;\">If left unaddressed, this non-random inclusion biases the estimates because the observed sample is systematically different from the population the researcher wants to generalize to.<\/p>\n<p style=\"text-align: left;\"><strong style=\"color: #000000;\">A Key Requirement: Data Beyond the Sample<\/strong><\/p>\n<p style=\"text-align: left; color: #000000;\">Unlike other IV approaches, the Heckman selection correction requires additional observations outside the estimation sample \u2014 specifically, observations where the selection variable is observed but the outcome is not. These extra observations are essential for modeling the selection process. Without them, the first-stage selection equation cannot be estimated.<\/p>\n<p style=\"text-align: left; color: #000000;\">For example, if a study examines the effect of marketing spend on sales but only observes sales for firms that publicly report financials, the researcher also needs data on firms that do not report \u2014 at least enough to model what predicts public disclosure.<\/p>\n<p style=\"text-align: left;\"><strong style=\"color: #000000;\">How It Works<\/strong><\/p>\n<p style=\"text-align: left; color: #000000;\">Step 1: Model the Selection Process<\/p>\n<p style=\"text-align: left; color: #000000;\">Estimate a probit regression that predicts the probability of an observation entering the sample. This model uses both the in-sample and out-of-sample observations and should include:<\/p>\n<ul style=\"text-align: left;\">\n<li><span style=\"color: #000000;\">All exogenous controls and fixed effects from the outcome equation<\/span><\/li>\n<li><span style=\"color: #000000;\">At least one strong and valid instrument \u2014 a variable that predicts sample inclusion but has no direct effect on the outcome<\/span><\/li>\n<\/ul>\n<p style=\"text-align: left; color: #000000;\">From this probit model, compute the Inverse Mills Ratio (IMR) \u2014 a correction term that captures the bias introduced by non-random sample selection.<\/p>\n<p style=\"text-align: left; color: #000000;\">Step 2: Estimate the Outcome Equation<\/p>\n<p style=\"text-align: left; color: #000000;\">Include the Inverse Mills Ratio from Step 1 as an additional control variable in the outcome regression, estimated on the in-sample observations only. The instrument is excluded from this stage. The IMR absorbs the correlation between the selection process and the error term, correcting for selection bias.<\/p>\n<p style=\"text-align: left; color: #000000;\">If the IMR is statistically significant, this indicates that sample selection is indeed endogenous and that the correction is doing meaningful work.<\/p>\n<p style=\"text-align: left; color: #000000;\">Standard errors must be corrected via bootstrapping, since the second stage uses a generated regressor from the first stage.<\/p>\n<p style=\"text-align: left;\"><strong style=\"color: #000000;\">Requirements<\/strong><\/p>\n<ul style=\"text-align: left;\">\n<li><span style=\"color: #000000;\">A clear selection mechanism determining which observations are in the sample<\/span><\/li>\n<li><span style=\"color: #000000;\">Additional observations outside the estimation sample to model the selection process<\/span><\/li>\n<li><span style=\"color: #000000;\">At least one strong instrumental variable (significantly predicts sample inclusion)<\/span><\/li>\n<li><span style=\"color: #000000;\">At least one valid instrumental variable (affects the outcome only through the selection process)<\/span><\/li>\n<li><span style=\"color: #000000;\">The first stage must include the same controls and fixed effects as the outcome equation<\/span><\/li>\n<li><span style=\"color: #000000;\">Bootstrapped standard errors to account for the generated Inverse Mills Ratio<\/span><\/li>\n<\/ul>\n<p style=\"text-align: left; color: #000000;\">The two Heckman corrections share the same mechanics (probit first stage, Inverse Mills Ratio, bootstrapped standard errors) but address different problems. The selection correction is about generalizability (can we extrapolate beyond this sample?), while the treatment correction is about internal validity (is the treatment effect confounded by non-random assignment?).<\/p>\n<p style=\"text-align: left;\"><strong style=\"color: #000000;\">Implementing the Heckman Selection Correction<\/strong><\/p>\n<p style=\"text-align: left;\"><span style=\"color: #000000;\">The <\/span><em><strong style=\"color: #000000;\">heckman<\/strong><\/em><span style=\"color: #000000;\"> command can be used to implement the Heckman Treatment Estimate with the following code in <\/span><strong style=\"color: #000000;\">Stata<\/strong><span style=\"color: #000000;\"> (<\/span><a style=\"color: #000000;\" href=\"https:\/\/www.stata.com\/manuals\/rheckman.pdf\">https:\/\/www.stata.com\/manuals\/rheckman.pdf<\/a><span style=\"color: #000000;\">).<\/span><\/p>\n<\/div><style type=\"text\/css\" scopped=\"scopped\">.fusion-syntax-highlighter-1 > .CodeMirror, .fusion-syntax-highlighter-1 > .CodeMirror .CodeMirror-gutters {background-color:var(--awb-color1);}.fusion-syntax-highlighter-1 > .CodeMirror .CodeMirror-gutters { background-color: var(--awb-color2); }.fusion-syntax-highlighter-1 > .CodeMirror .CodeMirror-linenumber { color: var(--awb-color8); }<\/style><div class=\"fusion-syntax-highlighter-container fusion-syntax-highlighter-1 fusion-syntax-highlighter-theme-light\" style=\"opacity:0;margin-top:0px;margin-right:15%;margin-bottom:0px;margin-left:15%;font-size:14px;border-width:1px;border-style:solid;border-color:var(--awb-color3);\"><div class=\"syntax-highlighter-copy-code\"><span class=\"syntax-highlighter-copy-code-title\" data-id=\"fusion_syntax_highlighter_1\" style=\"font-size:14px;\">Copy to Clipboard<\/span><\/div><label for=\"fusion_syntax_highlighter_1\" class=\"screen-reader-text\">Syntax Highlighter<\/label><textarea class=\"fusion-syntax-highlighter-textarea\" id=\"fusion_syntax_highlighter_1\" data-readOnly=\"nocursor\" data-lineNumbers=\"1\" data-lineWrapping=\"\" data-theme=\"default\">\/\/estimate the parameters of the endogenous treatment-regression model\n\nheckman Outcome Predictor Controls, select(Selection = Instrument Controls)<\/textarea><\/div><div class=\"fusion-text fusion-text-2 fusion-text-no-margin\" style=\"--awb-content-alignment:center;--awb-text-color:var(--awb-color1);--awb-margin-right:15%;--awb-margin-bottom:0px;--awb-margin-left:15%;\"><p style=\"text-align: left;\"><span style=\"color: #000000;\">\u00a0<\/span><\/p>\n<p style=\"text-align: left;\"><span style=\"color: #000000;\">The <\/span><em><strong style=\"color: #000000;\">sampleSelection<\/strong><\/em><span style=\"color: #000000;\"> package can be used to implement the Heckman Selection Correction with the following code in <\/span><strong style=\"color: #000000;\">R<\/strong><span style=\"color: #000000;\"> (<\/span><a style=\"color: #000000;\" href=\"https:\/\/cran.r-project.org\/web\/packages\/sampleSelection\/index.html\">https:\/\/cran.r-project.org\/web\/packages\/sampleSelection\/index.html<\/a><span style=\"color: #000000;\">).<\/span><\/p>\n<\/div><style type=\"text\/css\" scopped=\"scopped\">.fusion-syntax-highlighter-2 > .CodeMirror, .fusion-syntax-highlighter-2 > .CodeMirror .CodeMirror-gutters {background-color:var(--awb-color1);}.fusion-syntax-highlighter-2 > .CodeMirror .CodeMirror-gutters { background-color: var(--awb-color2); }.fusion-syntax-highlighter-2 > .CodeMirror .CodeMirror-linenumber { color: var(--awb-color8); }<\/style><div class=\"fusion-syntax-highlighter-container fusion-syntax-highlighter-2 fusion-syntax-highlighter-theme-light\" style=\"opacity:0;margin-top:0px;margin-right:15%;margin-bottom:0px;margin-left:15%;font-size:14px;border-width:1px;border-style:solid;border-color:var(--awb-color3);\"><div class=\"syntax-highlighter-copy-code\"><span class=\"syntax-highlighter-copy-code-title\" data-id=\"fusion_syntax_highlighter_2\" style=\"font-size:14px;\">Copy to Clipboard<\/span><\/div><label for=\"fusion_syntax_highlighter_2\" class=\"screen-reader-text\">Syntax Highlighter<\/label><textarea class=\"fusion-syntax-highlighter-textarea\" id=\"fusion_syntax_highlighter_2\" data-readOnly=\"nocursor\" data-lineNumbers=\"1\" data-lineWrapping=\"\" data-theme=\"default\">#load the sampleSelection package\n\nlibrary (sampleSelection)\n\n#outcome and treatment selection equations\n\nmodel_HSC <- selection(\n\nselection = Selection ~ Instrument + Controls,\n\noutcome = Outcome ~ Predictor + Controls,\n\ndata = Dataset,\n\nmethod = \"2step\"\n\n)\n\n#obtain the estimates\n\nsummary(model_HSC)<\/textarea><\/div><div class=\"fusion-text fusion-text-3 fusion-text-no-margin\" style=\"--awb-content-alignment:center;--awb-text-color:var(--awb-color1);--awb-margin-right:15%;--awb-margin-bottom:0px;--awb-margin-left:15%;\"><p style=\"text-align: left;\"><strong style=\"color: #000000;\">\u00a0<\/strong><\/p>\n<p style=\"text-align: left;\"><strong style=\"color: #000000;\">References<\/strong><\/p>\n<ul>\n<li style=\"text-align: left;\"><span style=\"color: #000000;\">Heckman, James J. (1979), \u201cSample Selection Bias as a Specification Error,\u201d Econometrica: Journal of the Econometric Society, 47, 153-161.<\/span><\/li>\n<li style=\"text-align: left;\"><span style=\"color: #000000;\">Wolfolds, Sarah E., and Jordan Siegel (2019), \u201cMisaccounting for Endogeneity: The Peril of Relying on the Heckman Two\u2010Step Method without a Valid Instrument,\u201d Strategic Management Journal, 40(3), 432-462.<\/span><\/li>\n<\/ul>\n<\/div><\/div><\/div><\/div><\/div>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"100-width.php","meta":{"footnotes":""},"class_list":["post-178","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/www.endogeneity.net\/index.php?rest_route=\/wp\/v2\/pages\/178","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.endogeneity.net\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.endogeneity.net\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.endogeneity.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.endogeneity.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=178"}],"version-history":[{"count":8,"href":"https:\/\/www.endogeneity.net\/index.php?rest_route=\/wp\/v2\/pages\/178\/revisions"}],"predecessor-version":[{"id":505,"href":"https:\/\/www.endogeneity.net\/index.php?rest_route=\/wp\/v2\/pages\/178\/revisions\/505"}],"wp:attachment":[{"href":"https:\/\/www.endogeneity.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=178"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}