{"id":138,"date":"2024-08-05T08:22:59","date_gmt":"2024-08-05T08:22:59","guid":{"rendered":"https:\/\/www.endogeneity.net\/?page_id=138"},"modified":"2026-04-04T14:36:09","modified_gmt":"2026-04-04T14:36:09","slug":"instrumental-variables","status":"publish","type":"page","link":"https:\/\/www.endogeneity.net\/?page_id=138","title":{"rendered":"Instrumental Variables"},"content":{"rendered":"<div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-1 fusion-flex-container has-pattern-background has-mask-background nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--link_hover_color: var(--awb-color5);--link_color: var(--awb-color5);--awb-background-blend-mode:multiply;--awb-border-color:var(--awb-color1);--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-padding-top:50.156000000000006px;--awb-padding-bottom:0px;--awb-padding-top-small:70px;--awb-padding-right-small:40px;--awb-padding-bottom-small:0px;--awb-padding-left-small:40px;--awb-margin-bottom-medium:0px;--awb-margin-bottom-small:60px;--awb-background-color:#ffffff;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-center fusion-flex-content-wrap\" style=\"max-width:1248px;margin-left: calc(-4% \/ 2 );margin-right: calc(-4% \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-0 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-padding-bottom-medium:0px;--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:85px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-margin-bottom-small:44px;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-1 fusion-text-no-margin\" style=\"--awb-content-alignment:center;--awb-text-color:var(--awb-color1);--awb-margin-right:15%;--awb-margin-bottom:0px;--awb-margin-left:15%;\"><p style=\"text-align: left;\"><span style=\"color: #000000;\"><strong>Finding Instruments<\/strong><\/span><\/p>\n<p style=\"text-align: left; color: #000000;\">The instrumental variable (IV) approach stands or falls with the quality of the instruments. Finding good instruments requires deep understanding of the theoretical, conceptual, and practical context of the study.<\/p>\n<p style=\"text-align: left; color: #000000;\">A credible instrument must satisfy two requirements:<\/p>\n<ul style=\"text-align: left;\">\n<li><strong style=\"color: #000000;\">Instrument strength<\/strong><span style=\"color: #000000;\"> means the instrument is significantly related to the endogenous predictor \u2014 both in theory and in the data.<\/span><\/li>\n<li><strong style=\"color: #000000;\">Instrument validity<\/strong><span style=\"color: #000000;\"> means the instrument is uncorrelated with the error term of the outcome. Put differently, the instrument affects the outcome only through the endogenous predictor.<\/span><\/li>\n<\/ul>\n<p style=\"text-align: left; color: #000000;\">If either requirement fails, the IV approach can produce biased estimates \u2014 potentially making the cure worse than the disease.<\/p>\n<p style=\"text-align: left;\"><em><strong style=\"color: #000000;\">The Strength\u2013Validity Trade-off<\/strong><\/em><\/p>\n<p style=\"text-align: left; color: #000000;\">There are no easy recipes for finding high-quality instruments. Only solid theoretical arguments can justify why a given instrument is both strong and valid. Worse, the two requirements often work against each other: the stronger an instrument predicts the endogenous variable, the harder it becomes to argue that it has no direct path to the outcome.<\/p>\n<p style=\"text-align: left;\"><em><strong style=\"color: #000000;\">Granularity Matters<\/strong><\/em><\/p>\n<p style=\"text-align: left; color: #000000;\">Beyond strength and validity, a good instrument must also be fine-grained enough to capture the endogenous variation in the focal predictor. The instrument should vary at the same level of aggregation as the endogenous variable. An instrument that varies only across broad categories may lack the resolution to isolate within-unit endogenous variation.<\/p>\n<p style=\"text-align: left;\"><strong style=\"color: #000000;\">Three Common Types of Instruments<\/strong><\/p>\n<p style=\"text-align: left;\"><em><strong style=\"color: #000000;\">Lagged Values of the Predictor<\/strong><\/em><\/p>\n<p style=\"text-align: left; color: #000000;\">Lagged values of the endogenous predictor are often strong instruments because past values tend to predict current values well. However, their validity is questionable. Past values of the predictor may affect the current outcome directly \u2014 for instance, through carryover effects, habit formation, or expectation-setting \u2014 rather than operating solely through the current predictor.<\/p>\n<p style=\"text-align: left; color: #000000;\">Lagged instruments are only valid if the unobserved confound is restricted to the current period. If past unobserved shocks still influence the current outcome, lagged values inherit the endogeneity problem they are supposed to solve.<\/p>\n<p style=\"text-align: left; color: #000000;\">Two strategies can improve validity:<\/p>\n<ul style=\"text-align: left;\">\n<li><strong style=\"color: #000000;\">Control for the mechanism.<\/strong><span style=\"color: #000000;\"> If there is a plausible channel through which the lagged predictor could directly affect the current outcome (e.g., carryover effects captured by a lagged outcome), including it as a control can block the direct path.<\/span><\/li>\n<li><strong style=\"color: #000000;\">Use longer lags.<\/strong><span style=\"color: #000000;\"> The further back in time the lag, the less likely it directly influences the current outcome. But the trade-off is clear: longer lags are more plausibly valid yet typically weaker, because they become less correlated with the current predictor.<\/span><\/li>\n<\/ul>\n<p style=\"text-align: left;\"><em><strong style=\"color: #000000;\">Peer Values of the Predictor<\/strong><\/em><\/p>\n<p style=\"text-align: left; color: #000000;\">Peer instruments use observations of the focal predictor from other contexts or entities. For example, the average advertising spend of peer firms can serve as an instrument for a focal firm&#8217;s own advertising.<\/p>\n<p style=\"text-align: left; color: #000000;\">The key challenge is calibrating the &#8220;distance&#8221; between the focal unit and its peers. Peers that are too similar to the focal unit risk violating validity (because their behavior may also affect the focal outcome). Peers that are too different risk being weak instruments.<\/p>\n<p style=\"text-align: left; color: #000000;\">Peer-of-peer instruments add an extra degree of separation: if firm A&#8217;s peer is firm B, and firm B&#8217;s peer is firm C (where C is not a peer of A), then C serves as a peer-of-peer instrument for A. The added distance strengthens the validity argument but may weaken instrument strength.<\/p>\n<p style=\"text-align: left;\"><em><strong style=\"color: #000000;\">Values of External Variables<\/strong><\/em><\/p>\n<p style=\"text-align: left; color: #000000;\">External variables \u2014 factors outside the focal entity&#8217;s control \u2014 can serve as instruments when it is plausible that they affect the predictor but not the outcome directly.<\/p>\n<p style=\"text-align: left; color: #000000;\">These instruments tend to be strong on validity because it is hard to argue that an external factor (e.g., weather, regulatory changes, geographic features) is influenced by the focal entity. However, their strength may be limited when the external variable operates at a coarser level of aggregation than the focal predictor. If the predictor varies at the individual-day level but the instrument varies only at the regional-month level, the instrument may explain too little variation to be useful.<\/p>\n<p style=\"text-align: left; color: #000000;\">Empirical Tests for Instrumental Variables<\/p>\n<p style=\"text-align: left; color: #000000;\">Empirical tests can support \u2014 but never replace \u2014 theoretical arguments for instrument quality. Theory must come first; tests can then confirm or raise concerns.<\/p>\n<p style=\"text-align: left;\"><em><strong style=\"color: #000000;\">Testing Instrument Strength<\/strong><\/em><\/p>\n<p style=\"text-align: left; color: #000000;\">Testing for strength is straightforward. In the first-stage regression, the inclusion of the instrument(s) should lead to a meaningful increase in the explained variance of the predictor.<\/p>\n<p style=\"text-align: left; color: #000000;\">What to report:<\/p>\n<ul style=\"text-align: left;\">\n<li><span style=\"color: #000000;\">The F-statistic for the instrument(s) specifically \u2014 not the F-statistic for the overall first-stage model including controls.<\/span><\/li>\n<li><span style=\"color: #000000;\">A widely used rule of thumb is an F-statistic above 10 for a single instrument. More detailed thresholds are available that account for the number of instruments, the number of endogenous predictors, and the maximum acceptable bias.<\/span><\/li>\n<\/ul>\n<p style=\"text-align: left;\"><strong style=\"color: #000000;\">A common mistake<\/strong><span style=\"color: #000000;\">: reporting only a significant correlation between the instrument and the predictor. Two variables can be significantly correlated even when the F-statistic falls below critical thresholds, because a shared correlation with control variables can drive the association.<\/span><\/p>\n<p style=\"text-align: left;\"><em><strong style=\"color: #000000;\">Software<\/strong><\/em><\/p>\n<ul style=\"text-align: left;\">\n<li><span style=\"color: #000000;\">R: Use <\/span><em><strong style=\"color: #000000;\">diagnostics<\/strong><\/em><span style=\"color: #000000;\"> in the <\/span><em><strong style=\"color: #000000;\">ivreg<\/strong><\/em><span style=\"color: #000000;\"> package, or <\/span><em><strong style=\"color: #000000;\">regTermTest<\/strong><\/em><span style=\"color: #000000;\"> in the <\/span><em><strong style=\"color: #000000;\">survey<\/strong><\/em><span style=\"color: #000000;\"> package<\/span><\/li>\n<li><span style=\"color: #000000;\">Stata: Use <\/span><em><strong style=\"color: #000000;\">estat firststage<\/strong><\/em><span style=\"color: #000000;\"> after <\/span><em><strong style=\"color: #000000;\">ivregress<\/strong><\/em><span style=\"color: #000000;\">, or the <\/span><em><strong style=\"color: #000000;\">test<\/strong><\/em><span style=\"color: #000000;\"> command<\/span><\/li>\n<\/ul>\n<p style=\"text-align: left;\"><em><strong style=\"color: #000000;\">Testing Instrument Validity<\/strong><\/em><\/p>\n<p style=\"text-align: left; color: #000000;\">Testing validity is fundamentally limited. The most common test \u2014 the Hansen-Sargan overidentification test \u2014 requires at least one instrument to be truly valid, which is itself an untestable assumption.<\/p>\n<p style=\"text-align: left; color: #000000;\">Key limitations:<\/p>\n<ul style=\"text-align: left;\">\n<li><span style=\"color: #000000;\">If all instruments are invalid, the overidentification test may fail to detect the problem, because the estimates from different (combinations of) instruments can be similarly biased.<\/span><\/li>\n<li><span style=\"color: #000000;\">The test requires a sufficiently large sample to detect meaningful differences between valid and invalid instruments.<\/span><\/li>\n<\/ul>\n<p style=\"text-align: left; color: #000000;\">Instrument validity can only be established through theoretical reasoning. It remains an assumption, not an empirical finding.<\/p>\n<p style=\"text-align: left;\"><em><strong style=\"color: #000000;\">Testing for the Presence of Endogeneity<\/strong><\/em><\/p>\n<p style=\"text-align: left; color: #000000;\">Empirically ruling out endogeneity is not really possible either.<\/p>\n<p style=\"text-align: left; color: #000000;\">Two common diagnostics exist:<\/p>\n<ul style=\"text-align: left;\">\n<li><span style=\"color: #000000;\">The Wu-Hausman test after two-stage least squares<\/span><\/li>\n<li><span style=\"color: #000000;\">The p-value of the control function residual in the second-stage regression<\/span><\/li>\n<\/ul>\n<p style=\"text-align: left;\"><span style=\"color: #000000;\">Both compare estimates with and without the endogeneity correction. A statistically significant difference suggests the predictor is endogenous.<\/span><br \/>\n<span style=\"color: #000000;\"><br \/>\nBut there are important caveats:<\/span><\/p>\n<ul style=\"text-align: left;\">\n<li><span style=\"color: #000000;\">These tests are only valid if the instruments themselves are valid \u2014 which, again, cannot be tested.<\/span><\/li>\n<li><span style=\"color: #000000;\">If the instruments are weak, the tests lack power. A non-significant result may simply mean the instruments are too weak to detect endogeneity, not that endogeneity is absent.<\/span><\/li>\n<\/ul>\n<p style=\"text-align: left;\"><strong style=\"color: #000000;\">The Bottom Line<\/strong><\/p>\n<p style=\"text-align: left; color: #000000;\">No statistical test can rule out endogeneity. The error term is, by definition, unobservable. This holds regardless of whether the approach uses instrumental variables, instrument-free methods, or selection models.<\/p>\n<p style=\"text-align: left; color: #000000;\">The credibility of any IV approach therefore rests on the strength of the theoretical argument \u2014 not on passing a battery of statistical tests.<\/p>\n<p style=\"text-align: left;\"><strong style=\"color: #000000;\">Further Readings<\/strong><\/p>\n<ul>\n<li style=\"text-align: left;\"><span style=\"color: #000000;\">Ebbes, Peter, Dominik Papies, and H.J. Heerde (2021), \u201cDealing with Endogeneity: A Nontechnical Guide for Marketing Researchers,\u201d Handbook of Market Research, Cham: Springer, 181-217.<\/span><\/li>\n<li style=\"text-align: left;\"><span style=\"color: #000000;\">Grewal, Rajdeep, and Ye\u015fim Orhun (2024), \u201cUnpacking the Instrumental Variables Approach,\u201d Impact at JMR: <\/span><a style=\"color: #000000;\" href=\"https:\/\/www.ama.org\/marketing-news\/unpacking-the-instrumental-variables-approach\/\">https:\/\/www.ama.org\/marketing-news\/unpacking-the-instrumental-variables-approach\/<\/a><\/li>\n<li style=\"text-align: left;\"><span style=\"color: #000000;\">Rossi, Peter (2014), \u201cEven the Rich can make themselves Poor: A Critical Examination of IV Methods in Marketing Applications,\u201d Marketing Science, 33(5), 655\u2013672.<\/span><\/li>\n<li style=\"text-align: left;\"><span style=\"color: #000000;\">Sande, Jon Bingen, and Mrinal Ghosh (2018), \u201cEndogeneity in Survey Research,\u201d International Journal of Research in Marketing, 35(2), 185-204.<\/span><\/li>\n<li style=\"text-align: left;\"><span style=\"color: #000000;\">Stock, J. H., and Yogo,M. (2005), \u201cTesting for Weak Instruments in Linear IV Regression,\u201d In: D.W. K. Andrews, and J. H. Stock (Eds.), Identification and Inference for Econometric Models: Essays in Honor of Thomas Rothenberg (pp. 80\u2013108). Cambridge, New York: Cambridge University Press.<\/span><\/li>\n<li style=\"text-align: left;\"><span style=\"color: #000000;\">Stock, James H., Jonathan H. Wright, and Motohiro Yogo (2002), \u201cA Survey of Weak Instruments and Weak Identification in Generalized Method of Moments,\u201d Journal of Business &amp; Economic Statistics, 20(4), 518-529.<\/span><\/li>\n<li style=\"text-align: left;\"><span style=\"color: #000000;\">Wooldridge, Jeffrey M. (2010). Econometric analysis of cross section and panel data. Cambridge: MIT Press.<\/span><\/li>\n<\/ul>\n<\/div><\/div><\/div><\/div><\/div>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"100-width.php","meta":{"footnotes":""},"class_list":["post-138","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/www.endogeneity.net\/index.php?rest_route=\/wp\/v2\/pages\/138","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.endogeneity.net\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.endogeneity.net\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.endogeneity.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.endogeneity.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=138"}],"version-history":[{"count":12,"href":"https:\/\/www.endogeneity.net\/index.php?rest_route=\/wp\/v2\/pages\/138\/revisions"}],"predecessor-version":[{"id":501,"href":"https:\/\/www.endogeneity.net\/index.php?rest_route=\/wp\/v2\/pages\/138\/revisions\/501"}],"wp:attachment":[{"href":"https:\/\/www.endogeneity.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=138"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}