Notebooks

## Instrumental Variables

28 May 2021 10:42

$\newcommand{\Expect}{\mathbb{E}\left[ #1 \right]} \newcommand{\Cov}{\mathrm{Cov}\left[ #1 \right]} \newcommand{\Var}{\mathrm{Var}\left[ #1 \right]}$

(I'll just talk graphical causal models here, because they make more sense to me than alternatives.)

This is a technique of causal inference. The basic logic is as follows. We want to estimate (or test, etc.) the effect of one observable variable $X$ on another, $Y$. That is, we want to find $\Expect{Y|do(X)}$. Unfortunately, we are pretty sure that this effect is confounded; there is some third variable $U$ which is a causal ancestor of both $X$ and $Y$. The "instrument" is a fourth, observable variable, say $W$, which is (i) an ancestor of $X$ and (ii) has no (unblocked) paths to $Y$ except through $X$. The no-unblocked-paths bit makes it easy for us to estimate both $\Expect{Y|do(W)}$ and $\Expect{X|do(W)}$. The trick is then to "back out" or "factor out" $\Expect{Y|do(X)}$ from these two observationally-identified functions.

If everything's linear, this is pretty straightforward in principle. First, write out the "structural" equations showing how each variable depends on its parents: $\begin{eqnarray} X & \leftarrow & \alpha_1 W + \alpha_2 U + \eta\\ Y & \leftarrow & \gamma_1 X + \gamma_2 U + \epsilon \end{eqnarray}$ Substituting the first into the second, we get that $Y = \alpha_1 \gamma_1 W + (\alpha_2 \gamma_1 + \gamma_2) U + \gamma_1 \eta + \epsilon$ so the true coefficient of $Y$ on $W$ will be $\alpha_1 \gamma_1$. But the true coefficient of $X$ on $W$ will be $\alpha_1$. So just taking the ratio is one way to back out the coefficient we want, which is $\gamma_1$. Notice by the way that $\Cov{X, Y} = \gamma_1 \Var{X} + \alpha_2 \gamma_2 \Var{U}$ so just regressing $Y$ on $X$ will yield a coefficient which we might call $\beta = \gamma_1 + \alpha_2 \gamma_2 \frac{\Var{U}}{\Var{X}} ~,$ which can be arbitrarily different from $\gamma_1$. (Remember that the optimal linear coefficient for predicting any $Z$ from any $W$ is $\frac{\Cov{Z,W}}{\Var{W}}$, whether or not the true regression function is linear, the direction [if any] of causal relation, etc.)

Alternately, we can do "two-stage least squares". This is where we regress $Y$ not on $X$, but on what we'd predict $X$ to be based on $W$, namely $\alpha_1 W$. This, again, will plainly yield the coefficient $\gamma_1$.

The last two paragraphs assume everything is linear, but the basic logic doesn't. That logic is: we know $W$ only affects $Y$ by first affecting $X$; we can identify how $W$ affects $X$ and how $W$ affects $Y$; this has to tell us how the impulse is transmitted through $X$. What I am particularly interested in are nonparametric methods for instrumental-variable inference, which do not assume linearity.

There is a classic derivation here, which ends up expressing what we want as the solution to an integral equation. (I believe this formulation is due to Darolles et al. but I am writing from memory so I might be off.) Let's abbreviate $\Expect{Y|do(X=x)}$ as $f(x)$. The trick is to show that a certain integral transformation of $f$ can be expressed in terms of observably-identified quantities.

Say that $p(x,w)$ is the joint pdf of $X$ and $W$. (Similarly for the related conditional and marginal pdfs, hopefully kept clear by their arguments.) This is an observationally identified quantity. We can thus define $t(x,z) \equiv \int{p_{XW}(x, w) p_{XW}(z, w) dw} = \int{p(x|w) p(z|w) p^2(w) dw}$ as a sort of kernel (in the machine-learning sense), expressing something like "how similar are the events $X=x$ and $Z=z$, as potential consequences of $W$?" We can in fact make this into the kernel of an integral operator on functions of $x$, $(T\psi)(x) = \int{t(z,x) \psi(z) dz}$ Now the claim is that $\Expect{\Expect{Y|W} p_{XW}(x, W)} = (Tg)(x)$ This helps us if the operator $T$ has an inverse, $T^{-1}$, because then $f(x) = \Expect{\Expect{Y|W} (T^{-1} p_{XW})(x, W)}$ (To see this, apply $T$ to both sides of the last equation above, and remember that $T$ is by construction a linear operator.)

To verify the claim, start by noticing that we can write $Y = f(X) + U + \epsilon$ where without loss of generality $\Expect{U} = 0$, but $\Expect{U|X} \neq 0$. On the other hand, $\Expect{U|W} = 0$, because (in the graphical model we're assuming) $U$ and $W$ are both exogeneous, hence independent. So $\begin{eqnarray} \Expect{Y|W=w} & = & \Expect{g(X) + U+\epsilon|W=w}\\ & = & \Expect{f(X)|W=w}\\ & = & \int{p(x|w) f(x) dx}\\ & = & \frac{\int{p(x, w) f(x) dx}}{p(w)} \end{eqnarray}$ Thus $\begin{eqnarray} \Expect{\Expect{Y|W} p(x,W)} & = & \int{p(w) \Expect{Y|W=w} p(x,w) dw}\\ & = & \int{p(w) p(x,w) \frac{\int{p(z,w) f(z) dz}}{p(w)} dw}\\ & = & \int{\int{f(z) p(z,w) p(x,w) dw dx}}\\ & = & \int{dz g(z) \int{p(z,w) p(x,w) dw}}\\ & = & \int{dz g(z) t(x,z)} \end{eqnarray}$ as desired.

This is one of the places where I follow the math and can use it, but there is something missing from my grasp of it, because it would never occur to me on my own to go through this set of manipulations. In fact I have to look at my notes to remember it right now. (In fact, when I wrote the section of ADAfaEPoV about instrumental variables and integral equations, I worked from memory / trying to derive everything from first principles, and came up with a much simpler approach --- which was quite wrong.) So one thing I would like to do is find some story which makes all this natural. If nothing else, it would help me to teach it!

Recommended, close ups about nonparametrics:
• S. Darolles, Y. Fan, J. P. Florens and E. Renault, "Nonparametric Instrumental Regression", Econometrica 79 (2011): 1541--1565 [Preprint version, 2002. While I haven't done a line by line comparison between the preprint and the published version, remarkably little seems to have changed over those 9 years. There is a story there and I'd be curious to learn it.]
• Peter Hall, Joel L. Horowitz, "Nonparametric methods for inference in the presence of instrumental variables", Annals of Statistics 33 (2005): 2904--2929, arxiv:math/0603130
• Whitney K. Newey and James L. Powell, "Instrumental Variable Estimation of Nonparametric Models", Econometrica 71 (2003): 1565--1578
• Rahul Singh, Maneesh Sahani, Arthur Gretton, "Kernel Instrumental Variable Regression", NeurIPS 2019, arxiv:1906.00232
Recommended, close ups about methodology:
• Stephen G. Hall, P. A. V. B. Swamy and George S. Tavlas, "On the Interpretation of Instrumental Variables in the Presence of Specification Errors", working paper 14/19, Department of Economics, University of Leicester[PDF preprint. I actually find myself in the odd position of thinking that while this is technically correct, it's a bit unfair to instrumental variables. Some of the issues here seem like they could be sensibly resolved using Pearl's graphical definition of IVs, perhaps in combination with nonparametric regressions.]
• Jonathan Mellon, "Rain, Rain, Go Away: 176 Potential Exclusion-Restriction Violations for Studies Using Weather as an Instrumental Variable", ssrn/3715610 [This well-written paper makes the interesting point that using the same instrument $W$ to study the effect of many different causes $X$ weakens the credibility of all the studies, because each such $X ^{\prime}$ provides another pathway by which $W$ could be an ancestor of $Y$, without going through the $X$ of interest.]
• Judea Pearl, "On a Class of Bias-Amplifying Covariates that Endanger Effect Estimates", UAI2010, arxiv:1203.3503
• Tom Pepinsky, "OMFG Exogenous Variation! Or, Can You Find Good Nails When You Find an Indonesian Politics Hammer?" [Admittedly, less formal in presentation than many of the rest of these links]
• Alwyn Young, "Consistency without Inference: Instrumental Variables in Practical Application" [2017 preprint, LSE. To summarize very roughly, this is an argument that in published IV regressions, the problems due to a handful of data points having very high leverage/influence, and non-IID noise, are much more important than the bias reduction from using IV rather than OLS. PDF via Dr. Young.]
Modesty forbids me to recommend:
• CRS, Advanced Data Analysis from an Elementary Point of View [The discussion of instrumental variables is spread out over the chapters on identification and estimation of causal effects. Right now (May 2021) there are some unfortunate errors there about the nonlinear case, which I need to fix.]