Instrumental Variables

Identifying Instrumental Variables

The appropriateness of the instrumental variable approach depends on the quality of the instruments. To identify potential instruments, the researcher must have a deep understanding of the theoretical, conceptual, and practical context of the study to meet two requirements:

Instrument strength implies that the instrument is significantly related to the endogenous predictor (theoretically and empirically)
Instrument validity implies that an instrument is uncorrelated with the error term of the outcome (theoretically). This assumption implies that the instrument exerts its effect on the outcome only through the endogenous predictor, similar to a full mediation

If one of these two requirements is not fulfilled, the instrumental variable approach will fail, and may produce biased estimates for the effect of interest (Ebbes et al. 2017, Rossi 2014). Another consideration should be whether a potential instrument is fine-grained enough to capture the endogenous variance in the focal predictor. Unfortunately, there are no clear guidelines on how to find high quality instruments; only solid theoretical arguments can justify why a certain instrument is both strong and valid. To make it worse, the two requirements are often contradicting each other: the stronger the instrument, the harder it is to argue that the instrument is valid (Sand and Ghosh 2018). In the following, we review typical types of instruments that have been used by consumer researchers.

Lagged values of the predictor

Some researchers rely on lagged values of the potentially endogenous predictor as instruments. The appeal of this identification strategy is that lagged marketing variables are often strong predictors of current marketing variables, and hence they will typically satisfy the condition of being a strong instrument. However, Rossi (2014) questions the validity of lagged marketing variables because these may influence the focal outcome beyond their effect on the focal predictor, for example by affecting engagement with the influencer/firm through a carryover effect from the previous post or by setting expectations (i.e., an internal reference) for the content of the influencer/firm. More generally, lagged values of the predictor will only be valid instruments if the unobserved variable is restricted to the current value of the dependent variable, which will not be the case if lagged unobserved variables still continue to affect the current dependent variable. Thus, researchers should only cautiously use lagged values of the predictor as instruments.
Notably, Papies et al. (2017) provide two recommendations to make lagged variables more valid instruments. First, if possible, the mechanism through which lagged values of the predictor may affect the current outcome should be included as a control. Second, longer lags of the predictor could be used as instruments, for example, rather than using the last post’s values, the values from older posts could be used. However, the tradeoff of strength and validity is obvious here. The longer the lag, the less likely that the past value of the predictor influences the focal outcome directly, making the instrument more valid. At the same time, the strength of the instrument will likely decrease from longer lags because it becomes more removed from the current predictor.

Peer values of the predictor

Another strategy for researchers is to rely on “peer values” of the predictor, which relates either to observations of the focal predictor in other contexts or observations of the focal predictor from other entities. Again, it is important to choose the “peer values” of the predictor sufficiently different from the focal context (to ensure instrument validity) yet not too far away (to ensure instrument strength), and to carefully justify why the “peer instruments” only affect the focal predictor via their effect on the focal predictor.

Values of external variables

Other researchers rely on values of external variables as instruments. The common assumption is that it is very unlikely that the external variables are affected by the focal entity. Thus, these instruments are likely valid; their strength, however, may be questionable because the instruments are not measured at the same fine-grained level as the focal predictor. In such cases, the instruments’ ability to explain variation in the focal predictor is limited.

Empirical Tests of Instrumental Variables

Empirical tests of instrumental variables can never substitute solid theoretical arguments why a certain instrument is both strong and valid. However, after a strong and valid instrument has been identified, empirical tests can confirm instrument strength but not instrument validity. Also, empirically ruling out the presence of endogeneity is not really possible.

Empirically testing for instrument strength is straightforward. In the first stage regression, the inclusion of the instrument(s) should lead to a significant change in the explained variance of the predictor. This can be tested with the F-statistic for the instrument(s), using diagnostics in the ivreg package in R (https://cran.r-project.org/web/packages/ivreg/index.html) or estat firststage after the ivregress command in Stata (https://www.stata.com/manuals/rivregress.pdf).

Stock et al. (2002) suggest a critical threshold value of 10 for a single strong instrument. Stock and Yogo (2005) provide more detailed, model‐specific thresholds that account for the number of instruments, the number of endogenous predictors, the maximum desired relative bias, and the estimator used. Importantly, researchers must report the F‐statistic of only the instrument(s) rather than the F‐statistic of the overall first‐stage results including the controls, and merely reporting a significant correlation between the instrument(s) and the predictor is not enough. The strength of a specific predictor can also be obtained by regTermTest in R (https://cran.r-project.org/web/packages/survey/index.html) or the test command in Stata (https://www.stata.com/manuals/rtest.pdf). The two variables may display a significant correlation even when if the F‐statistic is below the critical values because of a shared correlation between the instrument and controls with the predictor.

Empirically testing for instrument validity is only possible if there is at least one valid instrument, which is a non‐testable assumption. Thus, despite that some scholars advocate to “empirically evaluate exogeneity” (e.g., Sande and Gosh 2018), all available options such as the popular Hansen‐Sargan test assume that at least one of the instruments is truly exogenous. If all instruments are invalid, the overidentification test may fail to detect that exogeneity is not satisfied because the estimates of the different (combinations of) instruments may be similar but equally off (Wooldridge 2010). In addition, the sample size needs to sufficiently large to detect significant differences between valid and invalid instruments. Thus, instrument validity can only be established with theoretical arguments and stays an assumption of the researcher.

Also, empirically ruling out the presence of endogeneity is not really possible. Researchers can perform a Wu‐Hausman test after two-stage least squares or inspect the p‐value for the included residuals in the second‐stage regression of the control function approach. Both compare the estimation without and with endogeneity correction, and if the estimates differ statistically such that the Wu‐Hausman test or the p‐value is significant, it is likely that the predictor is indeed endogenous. However: For all empirical endogeneity tests to be valid, the instrument(s) must be valid, which is a non‐testable assumption. In addition, if the instrument(s) are weak, endogeneity tests will lack the power to detect a difference due to the endogeneity correction. Thus, while a significant endogeneity test may signal the presence of endogeneity and would favor the two-stage least squares or control function approach over a regression without endogeneity correction, a non-significant endogeneity test may simply signal that the chosen instrument(s) are weak, invalid, or both and not necessarily that endogeneity is not present.

Ultimately, no statistical test – neither for instrumental variables nor instrument-free or selection model approaches – can rule out the potential presence of omitted variables, simultaneity, measurement error, sample selection, or treatment choice because endogeneity involves the error term, which by definition is not observable.

References

Ebbes, Peter, Dominik Papies, and H.J. Heerde (2021), “Dealing with Endogeneity: A Nontechnical Guide for Marketing Researchers,” Handbook of Market Research, Cham: Springer, 181-217.

Rossi, Peter (2014), “Even the Rich can make themselves Poor: A Critical Examination of IV Methods in Marketing Applications,” Marketing Science, 33(5), 655–672.

Sande, Jon Bingen, and Mrinal Ghosh (2018), “Endogeneity in Survey Research,” International Journal of Research in Marketing, 35(2), 185-204.

Stock, J. H., and Yogo,M. (2005), “Testing for Weak Instruments in Linear IV Regression,” In: D.W. K. Andrews, and J. H. Stock (Eds.), Identification and Inference for Econometric Models: Essays in Honor of Thomas Rothenberg (pp. 80–108). Cambridge, New York: Cambridge University Press.

Stock, James H., Jonathan H. Wright, and Motohiro Yogo (2002), “A Survey of Weak Instruments and Weak Identification in Generalized Method of Moments,” Journal of Business & Economic Statistics, 20(4), 518-529.

Wooldridge, Jeffrey M. (2010). Econometric analysis of cross section and panel data. Cambridge: MIT Press.

Navigating Endogeneity in Marketing Research