The Bactra Review: Occasional and eclectic book reviews by Cosma Shalizi   174

Statistical Explanation and Statistical Relevance

by Wesley C. Salmon

with Richard C. Jeffrey and Jeffrey G. Greeno

Pittsburgh: University of Pittsburgh Press, 1971, doi:10.2307/j.ctt6wrd9p

Distinctions That Make Differences to Chances

This short book reprints three papers, all originally published in 1970 or just before, by the three contributors, with an introduction and a conclusion by Salmon. I turned to this because it's almost the first source for Salmon's notion of a "statistical relevance basis". Briefly, and not quite following the notation here, the notion is this. Suppose we are interested in some outcome variable $Y$, and consider a set of (possibly) predictive variables $X=(X_1, X_2, \ldots X_d)$. Let us say that two points, $x = (x_1, x_2, \ldots x_d)$ and $x^{\prime}=(x^{\prime}_1, x^{\prime}_2, \ldots x^{\prime}_d)$ are equivalent when $P(Y|X=x) = P(Y|X=x^{\prime})$. [*] This defines an equivalence relation, since it's plainly reflexive, symmetric, and transitive. Every equivalence relation defines a partition, so this one does as well. The cells of this partition are configurations of the predictive variables which are "homogeneous" (as Salmon puts it) with respect to $Y$. Those cells are the elements of the statistical relevance basis. A difference between two configurations $x$ and $x^{\prime}$ is relevant to $Y$ if, but only if, $x$ and $x^{\prime}$ are not equivalent. In particular: if, given the value of some of the $X$ variables, we can adjust the values of others without moving from one cell of the partition to another, then those latter variables are irrelevant to $Y$ (either absolutely, or in certain configurations of the others). Thus drumming is irrelevant to the presence of tigers in North America, taking birth control pills is irrelevant to men failing to get pregnant, etc.

Salmon's notion here is that a statistical explanation of the event $Y=y$ consists in laying out the statistical relevance basis, and stating the conditional distribution for each cell. It is not necessary, in his view, that the explanation give the event high probability, or even that it increase the probability.

Reinforcing this is the paper by Jeffrey, which argues forcefully that often a statistical explanation of an event just consists in laying out the stochastic process which generates it, and not adding "and furthermore that process gives the event $Y=y$ high probability". Thus, for example, Jeffrey argues that "Why did this sequence of coin tosses come out HTHTHTHTTH?" is perfectly adequately explained by saying "The coin tosses followed a Bernoulli(0.5) process". (Those aren't direct quotes, but they are pretty literal paraphrases.)

The paper by Greeno complements Salmon's view of what constitutes an explanation, essentially by arguing that the strength of the explanation is given, information-theoretically, by $I[X;Y] = H[Y] - H[Y|X]$, the reduction in entropy of $Y$ from conditioning on $X$. (As the authors do not note, that is going to be equal to $I[S;Y]$ where $S=s(X)$ is the random variable say which cell in the statistical relevance basis $X$ is located in, because $S$ is a sufficient statistic.); Greeno supplements his plausibility arguments by proving a simple form of Fano's inequality, relating the probability of mis-classifying a binary $Y$ to $H[Y|X]$. (Greeno does not appear to have heard of Fano's inequality.) --- Incidentally, I think Greeno would have to accept that in Jeffrey's example, of explaining a sequence of coin tosses by pointing to the generating process, the strength of the explanation is actually 0, but also that that's the strongest explanation possible.

I was lead to this book a long time ago. Back in May, 1998, when I was launched on my [thesis research] about [complexity, information theory, sufficient statistics and partitions of predictors], I happened upon a copy of Salmon's Scientific Explanation and the Causal Structure of World [Princeton University Press, 1984] in a Madison used bookstore. Browsing through it, I had the unpleasant realization that Salmon's construction of the statistical relevance basis was, in different words, the same as the construction of "causal states" (per Crutchfield and Young, 1989) that I was investigating. Needless to say, I read the book, learned a lot from it, and even eventually got a paper out of the connection. But it was a nasty shock, it made me paranoid about trying to read everything, and it alerted me to a whole past and continuing history of rediscovering this circle of ideas.

The encounter also made me curious about Salmon's prior work on the subject, which he referred to in the 1984 book, but was, then, hard for me to track down. By the time, decades later and after moving to Pittsburgh, I found a copy of this book, I was no longer working on those subjects, so it wasn't until this month that I actually read it. This made it clear that while this book was published in 1971, the central paper by Salmon had appeared in an edited volume in 1970, and earlier versions had circulated in manuscript for some time in the 1960s, since Greeno cites it in that form. This isn't quite definite for me to say exactly when Salmon introduced the statistical relevance basis, but "no later than 1970" for sure.

Whatever interest this book might have is now, I think, entirely historical. Salmon's ideas remain valuable, but there's nothing important here which isn't also in his 1984 book, better expressed and more fully worked-out. So while I'm glad I read this, I'm not sure I can recommend it, unless you happen to be doing research into the history of these topics. Scientific Explanation and the Causal Structure of the World, however, I can and do recommend.

--- One point I cannot resist making before closing. Salmon does not distinguish, in his formalism, between $P(Y|X=x)$ and what those of us who've read Pearl would write $P(Y|do(X=x))$. (Of course, Spirtes, Glymour and Scheines [ch. 3] offered an alternative and equivalent notation --- and Glymour was Salmon's student.) He is, however, perfectly well aware of the difference between these, and appeals to the possibility of experimental manipulations to achieve what (following Reichenbach) he calls "screening off", i.e., remote causes being irrelevant given proximate causes, and effects being irrelevant given causes. But his formalism doesn't allow for this, either here or in the 1984 book. The natural thing to want, then is to say that two configurations of variables $x$ and $x^{\prime}$ are equivalent, with respect to $Y$, when $P(Y|do(X=x)) = P(Y|do(X=x^{\prime}))$. The people who have actually worked this out are Chalupka, Perona and Eberhardt (2015, 2016). Gratifyingly, the causal version of the theory goes through almost exactly the same way as the one using ordinary conditioning, and they even work out some nice results on the relationship between the two partitions (e.g., the causal partition is usually a coarsening of the merely-probabilistic one).

*: There are issues here about distinct "versions" of conditional probabilities, which disagree only on subsets of $x$ of measure 0. Salmon consigns them to a dismissive footnote, and I follow his wise example.

Philosophy of Science / Probability and Statistics
Drafted 25 February 2022, posted 6 March 2022