{"id":194,"date":"2024-08-05T09:11:00","date_gmt":"2024-08-05T09:11:00","guid":{"rendered":"https:\/\/www.endogeneity.net\/?page_id=194"},"modified":"2026-04-04T14:31:48","modified_gmt":"2026-04-04T14:31:48","slug":"the-general-endogeneity-problem","status":"publish","type":"page","link":"https:\/\/www.endogeneity.net\/?page_id=194","title":{"rendered":"The Endogeneity Problem"},"content":{"rendered":"<div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-1 fusion-flex-container has-pattern-background has-mask-background nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--link_hover_color: var(--awb-color5);--link_color: var(--awb-color5);--awb-background-blend-mode:multiply;--awb-border-color:var(--awb-color1);--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-padding-top:50.1406px;--awb-padding-bottom:0px;--awb-padding-top-small:70px;--awb-padding-right-small:40px;--awb-padding-bottom-small:0px;--awb-padding-left-small:40px;--awb-margin-bottom-medium:0px;--awb-margin-bottom-small:60px;--awb-background-color:#ffffff;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-center fusion-flex-content-wrap\" style=\"max-width:1248px;margin-left: calc(-4% \/ 2 );margin-right: calc(-4% \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-0 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-padding-bottom-medium:0px;--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:85px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-margin-bottom-small:44px;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-1 fusion-text-no-margin\" style=\"--awb-content-alignment:center;--awb-text-color:var(--awb-color1);--awb-margin-right:15%;--awb-margin-bottom:0px;--awb-margin-left:15%;\"><p style=\"text-align: left;\"><span style=\"color: #000000;\"><strong>The Endogeneity Problem<\/strong><\/span><\/p>\n<p style=\"text-align: left; color: #000000;\">A good starting point for understanding the endogeneity problem is the concept of spurious correlations.<\/p>\n<p style=\"text-align: left;\"><strong style=\"color: #000000;\">Spurious Correlations<\/strong><\/p>\n<p style=\"text-align: left; color: #000000;\">Consider the well-known example: monthly ice cream sales and monthly murder rates are highly correlated. A na\u00efve look at the data would suggest a reliable statistical relationship. But the relationship is spurious \u2014 it is driven by a confounder. Higher temperatures in summer simultaneously boost ice cream sales and, coincidentally, murder rates. There is an omitted cause behind the observed association.<\/p>\n<p style=\"text-align: left;\"><strong style=\"color: #000000;\">How Endogeneity Differs<\/strong><\/p>\n<p style=\"text-align: left; color: #000000;\">Spurious correlations and endogeneity both produce misleading appearances of relationships in data. The key difference lies in directionality:<\/p>\n<ul style=\"text-align: left;\">\n<li><span style=\"color: #000000;\">A spurious correlation concerns two variables that appear associated with each other.<\/span><\/li>\n<li><span style=\"color: #000000;\">Endogeneity arises when one variable is thought to cause the other \u2014 and the estimated causal effect is biased because of an omitted factor.<\/span><\/li>\n<\/ul>\n<p style=\"text-align: left; color: #000000;\">In practice, the line between the two is often blurry. Even when researchers frame their findings as mere &#8220;associations,&#8221; there is typically an implicit understanding of directionality \u2014 a predictor and an outcome have been defined, a directional hypothesis has been stated, and regression analysis has been used to estimate the effect. With that in mind, the logic of spurious correlations provides an intuitive foundation for understanding endogeneity.<\/p>\n<p style=\"text-align: left;\"><strong style=\"color: #000000;\">What Is Endogeneity?<\/strong><\/p>\n<p style=\"text-align: left; color: #000000;\">In its simplest form, regression analysis aims to assess the impact of a predictor on an outcome. The regression equation has three components:<\/p>\n<ul style=\"text-align: left;\">\n<li><span style=\"color: #000000;\">An intercept \u2014 where the regression line meets the y-axis when the predictor equals zero<\/span><\/li>\n<li><span style=\"color: #000000;\">An estimated coefficient \u2014 the slope describing the effect of the predictor on the outcome, used for hypothesis testing<\/span><\/li>\n<li><span style=\"color: #000000;\">An error term \u2014 everything else that influences the outcome but is not included in the model<\/span><\/li>\n<\/ul>\n<p style=\"text-align: left; color: #000000;\">The endogeneity problem arises when the predictor is correlated with one or more unobserved factors captured in the error term. When this happens, the estimated coefficient no longer reflects the true causal effect. It is biased \u2014 potentially overstating, understating, masking, or even reversing the real relationship.<\/p>\n<p style=\"text-align: left;\"><strong style=\"color: #000000;\">Why It Cannot Be Tested Directly<\/strong><\/p>\n<p style=\"text-align: left; color: #000000;\">The error term is, by definition, unobservable. This means there is no direct way to test whether it is correlated with the predictor. The threat of endogeneity can therefore never be empirically ruled out \u2014 it must be addressed through a combination of research design, theoretical reasoning, and statistical tools.<\/p>\n<p style=\"text-align: left;\"><em><span style=\"color: #000000;\">A note on terminology: the residual is the estimated value of the error term (outcome minus predicted outcome). The error term (or disturbance term) is its theoretical, pre-estimation counterpart.<\/span><\/em><\/p>\n<p style=\"text-align: left;\"><strong style=\"color: #000000;\">The Ice Cream Seller Example<\/strong><\/p>\n<p style=\"text-align: left; color: #000000;\">Consider a smart ice cream seller who knows that good weather brings more people to the beach \u2014 and that those people are willing to pay more. So the seller raises prices on sunny days and lowers them on rainy days.<\/p>\n<p style=\"text-align: left; color: #000000;\">A researcher analyzing daily data on ice cream prices and sales would find that higher prices are associated with more sales. But this estimated price effect is biased. Weather affects both the price (the seller&#8217;s decision) and sales (consumer demand), but weather is not in the model. It is captured in the error term, creating a correlation between the predictor (price) and the error term.<\/p>\n<p style=\"text-align: left;\"><strong style=\"color: #000000;\">Two Components of the Observed Effect<\/strong><\/p>\n<p style=\"text-align: left; color: #000000;\">The observed price effect is a mix of two things:<\/p>\n<ul style=\"text-align: left;\">\n<li><span style=\"color: #000000;\">The exogenous component \u2014 the true causal effect of price on sales (which is negative: higher prices reduce demand)<\/span><\/li>\n<li><span style=\"color: #000000;\">The endogenous component \u2014 the variation in price driven by the omitted weather variable (which creates a positive association: sunny days bring both higher prices and more customers)<\/span><\/li>\n<\/ul>\n<p style=\"text-align: left; color: #000000;\">Depending on which component dominates, the estimated effect could be negative, positive, or even zero. In any case, it does not reflect the true price effect that would emerge from a controlled experiment where prices are randomly assigned.<\/p>\n<p style=\"text-align: left;\"><strong style=\"color: #000000;\">Why the Solution Is Not Always Obvious<\/strong><\/p>\n<p style=\"text-align: left; color: #000000;\">In this stylized example, the fix is straightforward: collect data on weather and include it as a control variable. But in realistic settings, addressing endogeneity is far more difficult:<\/p>\n<ul style=\"text-align: left;\">\n<li><span style=\"color: #000000;\">Researchers cannot readily observe all relevant omitted variables<\/span><\/li>\n<li><span style=\"color: #000000;\">The sources of endogeneity extend beyond simple omitted variables to include simultaneity, measurement error, treatment selection, and sample selection<\/span><\/li>\n<\/ul>\n<p style=\"text-align: left; color: #000000;\">The first step in addressing the endogeneity problem is therefore to identify which sources are most likely to bias the estimates in a given study.<\/p>\n<\/div><\/div><\/div><\/div><\/div>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"100-width.php","meta":{"footnotes":""},"class_list":["post-194","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/www.endogeneity.net\/index.php?rest_route=\/wp\/v2\/pages\/194","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.endogeneity.net\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.endogeneity.net\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.endogeneity.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.endogeneity.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=194"}],"version-history":[{"count":14,"href":"https:\/\/www.endogeneity.net\/index.php?rest_route=\/wp\/v2\/pages\/194\/revisions"}],"predecessor-version":[{"id":197,"href":"https:\/\/www.endogeneity.net\/index.php?rest_route=\/wp\/v2\/pages\/194\/revisions\/197"}],"wp:attachment":[{"href":"https:\/\/www.endogeneity.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=194"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}