## April 30, 2009

### Books to Read While the Algae Grow in Your Fur, April 2009

Jérôme Dedecker, Paul Doukhan, Gabriel Lang, José Rafael León R., Sana Louhichi and Clémentine Prieur, Weak Dependence: With Examples and Applications
As every school-child knows, a sequence of probability measures $D_n$ converges "weakly" or "in distribution" on a limit $D$ if $D_n f \equiv \int{f(x) d D_n(x) }$ converges on $Df$ for every bounded and continuous test function $f$. In particular, if $D_n$ is the joint distribution of two random variables from a time series separated by a time-lag $n$, with marginal distributions $P$ and $Q$ for the past and the future, then the series is asymptotically independent if $D_n$ converges in distribution on the product measure $P \otimes Q$.
(This is "weak" convergence because the $D_n$ can look very different from the limiting $D$. A classic example: $D_n$ puts probability 1/n on the points $\frac{1}{n}, \frac{2}{n}, \ldots \frac{n}{n} =1$. This converges weakly to the uniform distribution on the unit interval [0,1], despite apparent obstacles like the latter giving probability 1 to the irrational numbers, while all the $D_n$ give them probability 0.)
A large literature has grown up since the 1950s which concerns itself with properties of asymptotically-independent stochastic processes, usually measuring the approach to independence by means of various "mixing coefficients", along the lines introduced by Rosenblatt. As the name implies, if these coefficients go to zero asymptotically, then the process is strongly mixing. Having these mixing coefficients go to zero is very nice, since it implies that many of the nice properties of IID sequences (central limit theorems, PAC-learning results, etc.), carry over to the dependent-but-mixing sequences. Unfortunately, strong mixing is a very strong property, which is hard to prove and in many cases is known to fail — even when there is asymptotic independence!
This book, which summarizes and extends a series of recent papers by the authors (in various combinations) is about proving limit theorems — laws of large numbers, central limit theorems, empirical-process convergence, etc. — for asymptotically-independent but non-mixing processes, which the authors call "weakly dependent". Their theorems in chs. 6--10 can, in large part, be seen as instances of the trick that I learned as "blocking", and which they attribute to S. N. Bernstein. To calculate the time average of some function of a (stationary) process, for instance, divide the series up into a series of long blocks, plus padding or filler in between them. The global average is, trivially, equal to the average of the within-block averages, plus a remainder term for contributions from the filler. By stationarity, the blocks all have the same distribution, and by weak dependence the blocks are nearly independent — more nearly with longer fillers. So one can show that the global average will have the same behavior as it would in the IID case, if one can control (1) the corrections due to the remaining dependence among the blocks, and (2) the remainder due to the fillers between the blocks. Controlling (2) is fairly straightforward; the new stuff here comes from controlling (1), the residual dependence.
Following lines laid down in mixing theory, the authors tackle (1) by constructing deviation inequalities, bounds on the probability that sample values differ from expectations by more than a certain amount. Typically, these inequalities (chs. 4 and 5) have a more-or-less combinatorial flavor, though they also use coupling, and a lot of manipulation of cumulants (without making the connection to large deviations theory via cumulant generating functions). — As usual with such inequalities, it's rarely clear whether the various factors of 33 (or whatever) are for real, or just the best one can do with the current method of proof; but they suffice for many purposes, and I suppose it's better to have the constants be explicit than to bury them inside $O()$ or $o()$.
In mixing theory, the analogous bounds are stated in terms of the mixing coefficients. Here the new bounds are stated in terms of a range of new dependence measures (ch. 2), all denoted by utterly un-mnenomic Greek letters (again following the mixing-theory convention). Some of them play directly off the definition of convergence in distribution in terms of expectations of test functions: how big can the difference between $D_n f$ and $\left(P \otimes Q\right) f$ get, when $f$ is restricted to some suitably well-behaved, normalized class of test functions? The other set of measures are "projective". The conditional expectation of a function $f$, given $X_1, X_2,$ etc., is the projection of $f$ on to the space of functions calculable from $X_1, X_2$, etc.; the unconditional expectation of $f$ is the projection of $f$ on to the space of constant functions, which are calculable from nothing. (By "calculable from" I mean "measurable in the minimal sigma-algebra of".) If $f$ is independent of $X$, then its conditional expectation is still its projection on to the constants. The other class of dependence coefficients, therefore, looks at how far conditional expectations depart from constancy — either maximizing over some family of test-functions, or by making future values of the time series itself the function in question.
Applications to non-parametric regression, spectral-density estimation, and various econometric problems occupy chs. 11--13, and ch. 3 shows that many standard and some non-standard time series models are weakly dependent, with estimates of the dependence coefficients.
— The book shows definite signs of its patch-up job origins. The prose is disjointed and slightly repetitive. The notation, too, is not always consistent; sometimes the generic dependence measure is c, sometimes it's \mu, and it can switch back and forth from one line to the next. A bigger annoyance is terminology. What the authors (and, it must be said, many probabilists) call "mixing" is more properly strong mixing, i.e., convergence of joint distributions to product measure in the strong, not the weak, topology. (If the Adversary can pick any pair of events, and you can show that that pair become independent as the separation between them grows, you have proved ordinary mixing. If the Adversary gets to pick a different pair of events with each separation, and you can still show asymptotic independence, that's strong mixing.) In dynamical systems and ergodic theory, "mixing" unmodified refers to the weaker notion of mixing. (See for instance the detailed treatment in Lasota and Mackey.) Thus for someone of my background, the Bernoulli shift is one of the canonical mixing processes, but they give it as their first example of a non-mixing process! (Their proof, incidentally, is wrong, but easily fixed.) In fact, I'd say that the failure to connect what they have done to the work in dynamics on decay of correlations (e.g.) is a real lost opportunity for both fields — optimistically, a direction for future research.
In the end, however, these are all minor complaints. The authors have brought together a great deal of original and important work, as a result of which the classical probabilistic limit theorems can now be rigorously applied to a much wider range of stochastic models. It's cool stuff and I would be very happy to teach it. I strongly recommend the book to statisticians interested in inference for stochastic processes, learning theorists interested in dependent data, probabilists interested in new applications, and theoretical econometricians.
Warren Ellis and Paul Duffield, , Freakangels, vol. 2
Jack Campbell, Relentless
Mind candy, in the process of moving from the Anabasis to De Bello Civili. (Which does not mean it's a historical pastiche.) Earlier morsels: 1--3, 4.
David Freidel, Linda Schele and Joy Parker, Maya Cosmos: Three Thousand Years on the Shaman's Path
A detailed and intelligent, if very conjectural, exploration of ancient Maya cosmology, based on deciphering surviving inscriptions, artifacts, and extrapolation from the modern Maya. (The presentation of the conjectures is remarkably confident.) Marred by irritating and silly stabs at cultural relativism (mostly at the beginning and the end).
Peter D. Harrison, The Lords of Tikal: Rulers of an Ancient Maya City
History of the greatest Maya city from the viewpoint of its rulers (the only view we really have any access to), from the beginning to the end. Clear on the difference between what we actually know and what the Mayanists are merely guessing at. Very good on architecture.
Bruce Kitchens and Selim Tuncel, Finitary Measures for Subshifts of Finite Type and Sofic Systems
As you know, Bob, a shift dynamical system has a state-space consisting of infinite sequences (one or two sided), say $x$ drawn from a finite alphabet. The time-evolution operator $T$, or shift, simply moves the sequence one step to the right, i.e., ${Tx}_n = x_{n+1}$. This moves all the complexity and distinctions between different systems into the set of initial conditions, and space of allowed sequences, rather than in the time-evolution rule. The full shift has all possible sequences among its initial conditions (and so the space never shrinks); sub-shifts restrict the set of initial conditions. Suppose that the allowed values of $x_n$ depend on $x_{n-1}, x_{n-2}, \ldots x_{n-k}$, but not on earlier entries in $x$, and that this $k$ is the same for all $n$ and all sequences. Then we have a subshift of finite type, the type or order of the subshift being $k$. These are the symbolic-dynamical equivalents of Markov chains of order $k$. As with Markov chains, any higher-order subshift of finite type can be converted to a first-order subshift. (Define a new alphabet where each symbol is a block of length $k$ from the original alphabet.) A sofic system can be defined in one of two equivalent ways. One introduces the notion of a follower set, the set of all one-sided infinite sequences which can succeed a given history; a subshift is sofic if it has only finitely many follower sets. Alternately, introduce the notion of factor maps — continuous functions from one sequence space to another which commute with the shift. A sofic system is the image of a subshift of finite type under a factor map. A strictly sofic system is one which is sofic, but not a subshift of finite type.
Said another way, for any sofic system there is a set of states, possibly hidden, where the current state determines what symbols can be seen next, and the current state plus the next symbol determines the next state. Sofic systems are (the languages generated by) deterministic finite automata.
All of this is at the purely topological, possible-or-not level. One can of course also add probability measures on sequences. A Markov measure is one under which the distribution of $X_n$ depends on $X_{n-1}$, but not, given that, on previous symbols. (Similarly for higher-order chains.) In this monograph, Kitchens and Tuncel ask, what measures are to Markov measures as sofic systems are to subshifts of finite type? They call these measures finitary, and characterize them in several ways, including (1) measures where the number of follower distributions is finite, (2) factor-map images of Markov measures, (3) ones where the conditional distribution of the future is always independent of the remote past given a finite segment of the immediate past, though one of possibly variable, context-dependent length. (The last is what motivates the name "finitary".) They develop the theory of finitary measures in considerable detail, and in parallel to the usual theory of Markov measures, as that is found in ergodic theory and the thermodynamic formalism. Statistical aspects (the reconstruction of finitary measures from sample data) are not addressed — fortunately for me. Prior acquaintance with symbolic dynamics and the thermodynamic formalism is absolutely essential, and a lot of familiarity with manipulating semi-groups would help.
(Your best bet for obtaining this is directly from the publisher. However, the community would be better served by putting it on the arxiv.)