Instrumental Variables

Finding Instruments

The instrumental variable (IV) approach stands or falls with the quality of the instruments. Finding good instruments requires deep understanding of the theoretical, conceptual, and practical context of the study.

A credible instrument must satisfy two requirements:

Instrument strength means the instrument is significantly related to the endogenous predictor — both in theory and in the data.
Instrument validity means the instrument is uncorrelated with the error term of the outcome. Put differently, the instrument affects the outcome only through the endogenous predictor.

If either requirement fails, the IV approach can produce biased estimates — potentially making the cure worse than the disease.

The Strength–Validity Trade-off

There are no easy recipes for finding high-quality instruments. Only solid theoretical arguments can justify why a given instrument is both strong and valid. Worse, the two requirements often work against each other: the stronger an instrument predicts the endogenous variable, the harder it becomes to argue that it has no direct path to the outcome.

Granularity Matters

Beyond strength and validity, a good instrument must also be fine-grained enough to capture the endogenous variation in the focal predictor. The instrument should vary at the same level of aggregation as the endogenous variable. An instrument that varies only across broad categories may lack the resolution to isolate within-unit endogenous variation.

Three Common Types of Instruments

Lagged Values of the Predictor

Lagged values of the endogenous predictor are often strong instruments because past values tend to predict current values well. However, their validity is questionable. Past values of the predictor may affect the current outcome directly — for instance, through carryover effects, habit formation, or expectation-setting — rather than operating solely through the current predictor.

Lagged instruments are only valid if the unobserved confound is restricted to the current period. If past unobserved shocks still influence the current outcome, lagged values inherit the endogeneity problem they are supposed to solve.

Two strategies can improve validity:

Control for the mechanism. If there is a plausible channel through which the lagged predictor could directly affect the current outcome (e.g., carryover effects captured by a lagged outcome), including it as a control can block the direct path.
Use longer lags. The further back in time the lag, the less likely it directly influences the current outcome. But the trade-off is clear: longer lags are more plausibly valid yet typically weaker, because they become less correlated with the current predictor.

Peer Values of the Predictor

Peer instruments use observations of the focal predictor from other contexts or entities. For example, the average advertising spend of peer firms can serve as an instrument for a focal firm’s own advertising.

The key challenge is calibrating the “distance” between the focal unit and its peers. Peers that are too similar to the focal unit risk violating validity (because their behavior may also affect the focal outcome). Peers that are too different risk being weak instruments.

Peer-of-peer instruments add an extra degree of separation: if firm A’s peer is firm B, and firm B’s peer is firm C (where C is not a peer of A), then C serves as a peer-of-peer instrument for A. The added distance strengthens the validity argument but may weaken instrument strength.

Values of External Variables

External variables — factors outside the focal entity’s control — can serve as instruments when it is plausible that they affect the predictor but not the outcome directly.

These instruments tend to be strong on validity because it is hard to argue that an external factor (e.g., weather, regulatory changes, geographic features) is influenced by the focal entity. However, their strength may be limited when the external variable operates at a coarser level of aggregation than the focal predictor. If the predictor varies at the individual-day level but the instrument varies only at the regional-month level, the instrument may explain too little variation to be useful.

Empirical Tests for Instrumental Variables

Empirical tests can support — but never replace — theoretical arguments for instrument quality. Theory must come first; tests can then confirm or raise concerns.

Testing Instrument Strength

Testing for strength is straightforward. In the first-stage regression, the inclusion of the instrument(s) should lead to a meaningful increase in the explained variance of the predictor.

What to report:

The F-statistic for the instrument(s) specifically — not the F-statistic for the overall first-stage model including controls.
A widely used rule of thumb is an F-statistic above 10 for a single instrument. More detailed thresholds are available that account for the number of instruments, the number of endogenous predictors, and the maximum acceptable bias.

A common mistake: reporting only a significant correlation between the instrument and the predictor. Two variables can be significantly correlated even when the F-statistic falls below critical thresholds, because a shared correlation with control variables can drive the association.

Software

R: Use diagnostics in the ivreg package, or regTermTest in the survey package
Stata: Use estat firststage after ivregress, or the test command

Testing Instrument Validity

Testing validity is fundamentally limited. The most common test — the Hansen-Sargan overidentification test — requires at least one instrument to be truly valid, which is itself an untestable assumption.

Key limitations:

If all instruments are invalid, the overidentification test may fail to detect the problem, because the estimates from different (combinations of) instruments can be similarly biased.
The test requires a sufficiently large sample to detect meaningful differences between valid and invalid instruments.

Instrument validity can only be established through theoretical reasoning. It remains an assumption, not an empirical finding.

Testing for the Presence of Endogeneity

Empirically ruling out endogeneity is not really possible either.

Two common diagnostics exist:

The Wu-Hausman test after two-stage least squares
The p-value of the control function residual in the second-stage regression

Both compare estimates with and without the endogeneity correction. A statistically significant difference suggests the predictor is endogenous.

But there are important caveats:

These tests are only valid if the instruments themselves are valid — which, again, cannot be tested.
If the instruments are weak, the tests lack power. A non-significant result may simply mean the instruments are too weak to detect endogeneity, not that endogeneity is absent.

The Bottom Line

No statistical test can rule out endogeneity. The error term is, by definition, unobservable. This holds regardless of whether the approach uses instrumental variables, instrument-free methods, or selection models.

The credibility of any IV approach therefore rests on the strength of the theoretical argument — not on passing a battery of statistical tests.

Further Readings

Ebbes, Peter, Dominik Papies, and H.J. Heerde (2021), “Dealing with Endogeneity: A Nontechnical Guide for Marketing Researchers,” Handbook of Market Research, Cham: Springer, 181-217.
Grewal, Rajdeep, and Yeşim Orhun (2024), “Unpacking the Instrumental Variables Approach,” Impact at JMR: https://www.ama.org/marketing-news/unpacking-the-instrumental-variables-approach/
Rossi, Peter (2014), “Even the Rich can make themselves Poor: A Critical Examination of IV Methods in Marketing Applications,” Marketing Science, 33(5), 655–672.
Sande, Jon Bingen, and Mrinal Ghosh (2018), “Endogeneity in Survey Research,” International Journal of Research in Marketing, 35(2), 185-204.
Stock, J. H., and Yogo,M. (2005), “Testing for Weak Instruments in Linear IV Regression,” In: D.W. K. Andrews, and J. H. Stock (Eds.), Identification and Inference for Econometric Models: Essays in Honor of Thomas Rothenberg (pp. 80–108). Cambridge, New York: Cambridge University Press.
Stock, James H., Jonathan H. Wright, and Motohiro Yogo (2002), “A Survey of Weak Instruments and Weak Identification in Generalized Method of Moments,” Journal of Business & Economic Statistics, 20(4), 518-529.
Wooldridge, Jeffrey M. (2010). Econometric analysis of cross section and panel data. Cambridge: MIT Press.

Navigating Endogeneity in Marketing Research