{"id":158,"date":"2024-08-05T08:43:28","date_gmt":"2024-08-05T08:43:28","guid":{"rendered":"https:\/\/www.endogeneity.net\/?page_id=158"},"modified":"2026-04-04T14:37:17","modified_gmt":"2026-04-04T14:37:17","slug":"control-function-approach","status":"publish","type":"page","link":"https:\/\/www.endogeneity.net\/?page_id=158","title":{"rendered":"Control Function Approach"},"content":{"rendered":"<div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-1 fusion-flex-container has-pattern-background has-mask-background nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--link_hover_color: var(--awb-color5);--link_color: var(--awb-color5);--awb-background-blend-mode:multiply;--awb-border-color:var(--awb-color1);--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-padding-top:50.156000000000006px;--awb-padding-bottom:0px;--awb-padding-top-small:70px;--awb-padding-right-small:40px;--awb-padding-bottom-small:0px;--awb-padding-left-small:40px;--awb-margin-bottom-medium:0px;--awb-margin-bottom-small:60px;--awb-background-color:#ffffff;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-center fusion-flex-content-wrap\" style=\"max-width:1248px;margin-left: calc(-4% \/ 2 );margin-right: calc(-4% \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-0 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-padding-bottom-medium:0px;--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:85px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-margin-bottom-small:44px;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-1 fusion-text-no-margin\" style=\"--awb-content-alignment:center;--awb-text-color:var(--awb-color1);--awb-margin-right:15%;--awb-margin-bottom:0px;--awb-margin-left:15%;\"><p style=\"text-align: left;\"><span style=\"color: #000000;\"><strong>Control Function<\/strong><\/span><\/p>\n<p style=\"text-align: left; color: #000000;\">The control function approach addresses endogeneity from omitted variables, simultaneity, and measurement error. Like two-stage least squares, it relies on a strong and valid instrumental variable to isolate the exogenous part of the predictor. But instead of replacing the predictor with its predicted values, the control function approach adds the residual from the first stage as an extra control variable.<\/p>\n<p style=\"text-align: left;\"><strong style=\"color: #000000;\">How It Works<\/strong><\/p>\n<p style=\"text-align: left;\"><strong><em><span style=\"color: #000000;\">Step 1: First-Stage Regression<\/span><\/em><\/strong><\/p>\n<p style=\"text-align: left; color: #000000;\">Regress the endogenous predictor on the instrumental variable(s) and all exogenous controls and fixed effects from the outcome equation. Save the fitted residuals \u2014 these capture the endogenous component of the predictor.<\/p>\n<p style=\"text-align: left;\"><em><strong style=\"color: #000000;\">Step 2: Second-Stage Regression<\/strong><\/em><\/p>\n<p style=\"text-align: left;\"><span style=\"color: #000000;\">Estimate the outcome equation as usual, but include the first-stage residuals as an additional control variable. The instrument is excluded from this stage.<\/span><br \/>\n<span style=\"color: #000000;\"><br \/>\nThe residual absorbs the correlation between the predictor and the error term. If the residual is statistically significant, this signals that endogeneity is present and that the correction is doing meaningful work.<\/span><\/p>\n<p style=\"text-align: left;\"><strong style=\"color: #000000;\">Standard Errors<\/strong><\/p>\n<p style=\"text-align: left; color: #000000;\">Because the second-stage regression uses a generated variable (the first-stage residual), conventional standard errors are incorrect. Bootstrapping is required: repeatedly re-estimate both stages together and compute standard errors from the distribution of bootstrap estimates.<\/p>\n<p style=\"text-align: left;\"><strong style=\"color: #000000;\">When to Use the Control Function over 2SLS<\/strong><\/p>\n<p style=\"text-align: left; color: #000000;\">In linear models with continuous variables, the control function and two-stage least squares produce identical results. They diverge \u2014 and the control function becomes particularly useful \u2014 in several settings:<\/p>\n<ul style=\"text-align: left;\">\n<li><span style=\"color: #000000;\">Binary, count, or multinomial outcomes (e.g., logit, Poisson, multinomial models)<\/span><\/li>\n<li><span style=\"color: #000000;\">Multiple endogenous predictors<\/span><\/li>\n<li><span style=\"color: #000000;\">Models with interaction terms or nonlinear effects (e.g., quadratic terms involving the endogenous predictor)<\/span><\/li>\n<\/ul>\n<p style=\"text-align: left; color: #000000;\">The trade-off: the control function is typically more efficient (smaller standard errors) when correctly specified, but two-stage least squares is more robust to misspecification of the outcome model.<\/p>\n<p style=\"text-align: left;\"><strong style=\"color: #000000;\">Requirements<\/strong><\/p>\n<ul style=\"text-align: left;\">\n<li><span style=\"color: #000000;\">At least one strong instrumental variable (significantly predicts the endogenous predictor)<\/span><\/li>\n<li><span style=\"color: #000000;\">At least one valid instrumental variable (affects the outcome only through the predictor)<\/span><\/li>\n<li><span style=\"color: #000000;\">The first stage must include the same controls and fixed effects as the outcome equation<\/span><\/li>\n<li><span style=\"color: #000000;\">Bootstrapped standard errors to account for the generated regressor<\/span><\/li>\n<\/ul>\n<p style=\"text-align: left;\"><strong style=\"color: #000000;\">Implementing the Control Function Approach<\/strong><\/p>\n<p style=\"text-align: left;\"><span style=\"color: #000000;\">The <\/span><strong><em><span style=\"color: #000000;\">cfregress<\/span><\/em><\/strong><span style=\"color: #000000;\"> command can be used to implement the control function approach with the following code in <\/span><strong style=\"color: #000000;\">Stata<\/strong><span style=\"color: #000000;\"> (<\/span><a style=\"color: #000000;\" href=\"https:\/\/www.stata.com\/manuals\/rcfregress.pdf\">https:\/\/www.stata.com\/manuals\/rcfregress.pdf<\/a><span style=\"color: #000000;\">).<\/span><\/p>\n<\/div><style type=\"text\/css\" scopped=\"scopped\">.fusion-syntax-highlighter-1 > .CodeMirror, .fusion-syntax-highlighter-1 > .CodeMirror .CodeMirror-gutters {background-color:var(--awb-color1);}.fusion-syntax-highlighter-1 > .CodeMirror .CodeMirror-gutters { background-color: var(--awb-color2); }.fusion-syntax-highlighter-1 > .CodeMirror .CodeMirror-linenumber { color: var(--awb-color8); }<\/style><div class=\"fusion-syntax-highlighter-container fusion-syntax-highlighter-1 fusion-syntax-highlighter-theme-light\" style=\"opacity:0;margin-top:0px;margin-right:15%;margin-bottom:0px;margin-left:15%;font-size:14px;border-width:1px;border-style:solid;border-color:var(--awb-color3);\"><div class=\"syntax-highlighter-copy-code\"><span class=\"syntax-highlighter-copy-code-title\" data-id=\"fusion_syntax_highlighter_1\" style=\"font-size:14px;\">Copy to Clipboard<\/span><\/div><label for=\"fusion_syntax_highlighter_1\" class=\"screen-reader-text\">Syntax Highlighter<\/label><textarea class=\"fusion-syntax-highlighter-textarea\" id=\"fusion_syntax_highlighter_1\" data-readOnly=\"nocursor\" data-lineNumbers=\"1\" data-lineWrapping=\"\" data-theme=\"default\">\/\/estimate two-stage least-squares regression including first stage estimates\n\ncfregress Outcome Controls (Predictor = Instrument)<\/textarea><\/div><div class=\"fusion-text fusion-text-2 fusion-text-no-margin\" style=\"--awb-content-alignment:center;--awb-text-color:var(--awb-color1);--awb-margin-right:15%;--awb-margin-bottom:0px;--awb-margin-left:15%;\"><p style=\"text-align: left;\"><span style=\"color: #000000;\">\u00a0<\/span><\/p>\n<p><span style=\"color: #000000;\">The functionality of the cfregress command is not directly available in R<\/span><\/p>\n<p style=\"text-align: left;\"><span style=\"color: #000000;\"><strong>References<\/strong><\/span><\/p>\n<ul>\n<li style=\"text-align: left;\"><span style=\"color: #000000;\">Karaca\u2010Mandic, Pinar, and Kenneth Train (2003), \u201cStandard Error Correction in Two\u2010Stage Estimation with Nested Samples,\u201d The Econometrics Journal, 6(2), 401-407.<\/span><\/li>\n<li style=\"text-align: left;\"><span style=\"color: #000000;\">Papies, Dominik, Peter Ebbes, and Harald van Heerde (2017), \u201cAddressing Endogeneity in Marketing Models,\u201d Advanced Methods for Modeling Markets, Cham: Springer, 581\u2013627.<\/span><\/li>\n<li style=\"text-align: left;\"><span style=\"color: #000000;\">Petrin, Amil and Kenneth Train (2003), \u201cOmitted Product Attributes in Discrete Choice Models,\u201d National Bureau of Economic Research Working Paper No.W9452, Cambridge, MA.<\/span><\/li>\n<li style=\"text-align: left;\"><span style=\"color: #000000;\">Petrin, Amil, and Kenneth Train (2010), \u201cA Control Function Approach to Endogeneity in Consumer Choice Models,\u201d Journal of Marketing Research, 47(1), 3-13.<\/span><\/li>\n<\/ul>\n<\/div><\/div><\/div><\/div><\/div>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"100-width.php","meta":{"footnotes":""},"class_list":["post-158","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/www.endogeneity.net\/index.php?rest_route=\/wp\/v2\/pages\/158","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.endogeneity.net\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.endogeneity.net\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.endogeneity.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.endogeneity.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=158"}],"version-history":[{"count":9,"href":"https:\/\/www.endogeneity.net\/index.php?rest_route=\/wp\/v2\/pages\/158\/revisions"}],"predecessor-version":[{"id":503,"href":"https:\/\/www.endogeneity.net\/index.php?rest_route=\/wp\/v2\/pages\/158\/revisions\/503"}],"wp:attachment":[{"href":"https:\/\/www.endogeneity.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=158"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}