Notebooks

## Sufficient Statistics

15 Jun 2022 14:53

In statistical theory, a "statistic" is a well-behaved (i.e., "measurable") function of the data, which is what's actually used in calculations or inferences, rather than the full data set. E.g., the sample mean, the sample median, the sample variance, etc. A statistic is sufficient if it is just as informative as the full data. The concept was introduced by R. A. Fisher in the 1920s, and refined by Jerzy Neyman in the 1930s. Parametric sufficiency means that the statistic contains just as much information about (some) parameter of the model as the full data. More precisely: the actual data has a certain probability distribution conditional on the data, which in general will also involve the parameter. The statistic is sufficient if this conditional distribution is the same for all parameter values. (That's actually clearer in algebra but I don't feel up to writing it in HTML now.) Once we've controlled for the sufficient statistic, nothing else --- not even the original data --- can tell us anything more about the parameter. Predictive sufficiency is similar: given the predictively sufficient statistic, future observations can be predicted as well as if the whole past was available. Predictive sufficiency can be expressed concisely in terms of mutual information.

A necessary statistic is one which can be computed from any sufficient statistic, without reference to the original data. (It's "necessary" in the sense that any optimal inference implicitly involves knowing the necessary statistic.) Under pretty general conditions, maximum likelihood estimates are necessary statistics, though they are not always sufficient. A minimal sufficient statistic is one which is both necessary and sufficient --- i.e., it's just as informative as the original data, but it can be computed from any other sufficient statistic; no further compression of the data is possible, without losing some information.

A lot of my work has involved describing and finding predictively sufficient statistics for time series and spatio-temporal processes. It turns out that the statistical sufficiency property gives rise to a Markov property for the statistics. (Basically, computational mechanics turns out to be about constructive predictively sufficient statistics.) So I'm very interested in sufficiency in general, and especially how it relates to Markovian representations of non-Markovian processes.

Topics of particular interest: Necessary and sufficient conditions for the existence of non-trivial sufficient statistics; dimensionality of sufficient statistics; geometric and probabilistic characterizations; decision-theoretic properties; necessary statistics; minimal sufficient statistics for transducers; connections to causal inference; relationship between sufficiency and ergodic theory; characterization of different classes of stochastic processes in terms of their sufficient statistics; exponential families.

Recommended, big picture:
• Sufficiency is a very important topic in statistical inference, and any good book on theoretical statistics will cover it in depth. I like Mark Schervish's Theory of Statistics, but really any one will do.
• Persi Diaconis, "Sufficency as Statistical Symmetry", Proceedings of the AMS Centennial Symposium 15--26 [1988; PDF]
• E. B. Dynkin, "Sufficient statistics and extreme points", Annals of Probability 6 (1978): 705--730 ["The connection between ergodic decompositions and sufficient statistics is explored in an elegant paper by DYNKIN" --- Kallenberg, Foundations of Modern Probability, p. 577.]
Recommended, close ups:
• R. R. Bahadur, "Sufficiency and statistical decision functions," Annals of Mathematical Statistics 25 (1954): 423--462
• M. S. Bartlett
• "Statistical Information and Properties of Sufficiency", Proceedings of the Royal Society of London A 154 (1936): 124--137 [JSTOR]
• "Properties of Sufficiency and Statistical Tests", Proceedings of the Royal Society of London A 160 (1937): 268--282 [JSTOR]
• David Blackwell and M. A. Girshick, Theory of Games and Statistical Decisions [Blackwell was a pioneer in exploring the decision-theoretic properties of sufficiency, and this excellent old book contains many deep theorems in this area]
• Ronald W. Butler, "Predictive Likelihood Inference with Applications", Journal of the Royal Statistical Society B 48 (1986): 1--38 ["in the predictive setting, all parameters are nuisance parameters". JSTOR]
• John W. Fisher III, Alexander T. Ihler and Paula A. Viola, "Learning Informative Statistics: A Nonparametric Approach", pp. 900--906 in NIPS 12 (1999) [PDF reprint. I'd call this more of a semi-parametric approach than a fully non-parametric one; they assume a parametric form for the dependence structure, but are agnostic about the distributions of innovations, and so try to maximize non-parametrically estimated mutual informations. In the limit, this will give them sufficient statistics.]
• R. A. Fisher
• "A Mathematical Examination of the Methods of Determining the Accuracy of an Observation by the Mean Error, and by the Mean Square Error", Monthly Notices of the Royal Astronomical Society 80 (1920): 758--770 [Apparently the first time the sufficiency property was noted, though Fisher does not use that term here. PDF]
• "On the Mathematical Foundations of Theoretical Statistics", Philosophical Transactions of the Royal Society A 222 (1922): 309--368 [Formal introduction of the concept, and the name, of sufficiency, along with much else that has proved fundamental to statistics, such as the likelihood function and the method of maximum likelihood. PDF in two parts, 1, 2]
• "Theory of Statistical Estimation", Proceedings of the Cambridge Philosophical Society 22 (1925): 700--725 [Often, but mistakenly, cited in place of the 1922 paper; admittedly, clearer. PDF]
• Solomon Kullback, Information Theory and Statistics
• Solomon Kullback and R. A. Leibler, "On Information and Sufficiency", Annals of Mathematical Statistics 22 (1951): 79--86
• Rudolf Kulhavy, Recursive Nonlinear Estimation: A Geometric Approach
• Steffen L. Lauritzen
• Extremal Families and Systems of Sufficient Statistics [Mini-review.]
• "Extreme Point Models in Statistics", Scandinavian Journal of Statistics 11 (1984): 65--91 [Highlights of the book, without proofs but with decent typography. With useful discussion and a reply. JSTOR]
• "Sufficiency, Prediction and Extreme Models", Scandinavian Journal of Statistics 1 (1974): 128--134 [JSTOR]
• "On the Interrelationships among Sufficiency, Total Sufficiency, and Some Related Concepts", Preprint 8, Institute of Mathematical Statistics, University of Copenhagen (July 1974) [PDF scan via Prof. Lauritzen]
• Benoit Mandelbrot, "The Role of Sufficiency and of Estimation in Thermodynamics", Annals of Mathematical Statistics 33 (1962): 1021--1038 [Extensive thermodynamic variables as sufficient statistics for the conjugate intensive variables; Gibbs canonical form arising from natural requirements on finite-dimensional sufficient statistics, which can only be achieved for exponential families of probability distributions. Very clever, and IMHO a real contribution to the foundations of statistical mechanics and thermodynamics.]
• Giorgio Picci, "Some Connections Between the Theory of Sufficient Statistics and the Identifiability Problem", SIAM Journal on Applied Mathematics 33 (1977): 383--398 [Introduces the idea of a "maximal identifiable statistic" --- the coarsest partition of hypothesis space where each equivalence class/cell of the partition gives rise to a distinct distribution of observables. (I would prefer "parameter", rather than "statistic", since it's a function of the distribution, not the observables, but that's a quibble.) It might be interesting to try to define emergence in these terms --- perhaps as a restriction on the observable sigma-field such that the equivalence classes of the maximal identifiable parameter become infinite-dimensional, or something like that. JSTOR. Thanks to Rhiannon Weaver for the pointer.]
• David Pollard, "A note on insufficiency and the preservation of Fisher information", arxiv:1107.3797
• Ge Xu, Biao Chen, "The Sufficiency Principle for Decentralized Data Reduction", arxiv:1207.3265
• Nihat Ay, Jürgen Jost, Hông Vân Lê, Lorenz Schwachhüfer, "Information geometry and sufficient statistics", arxiv:1207.6736
• T. Bohlin, "Information pattern for linear discrete-time models with stochastic coefficients," IEEE Transactions on Automatic Control 15 (1970): 104--106 [On recursively-computable sufficient statistics]
• R. Dennis Cook, Liliana Forzani, and Adam J. Rothman, "Estimating sufficient reductions of the predictors in abundant high-dimensional regressions", Annals of Statistics 40 (2012): 353--384
• E. B. Dynkin, "Necessary and sufficient statistics for a family of probability distributions," Uspekhi maetm. nauk 6 (1951): 68--90 [Apparently translated in Select. Trans. Math. Statist. Prob. 1 (1951): 23--41. Zacks, below, is supposed to follow closely]
• David Hinkley, "Predictive Likelihood", Annals of Statistics 7 (1979): 718--728
• V. S. Huzurbazar, Sufficient Statistics: Selected Contributions
• Anna Jencova and Denes Petz, "Suffificiency in quantum statistical inference", math-ph/0412093
• Kuang-Yao Lee, Bing Li, and Francesca Chiaromonte, "A general theory for nonlinear sufficient dimension reduction: Formulation and estimation", Annals of Statistics 41 (2013): 221--249, arxiv:1304.0580
• Yanyuan Ma and Liping Zhu, "Efficient estimation in sufficient dimension reduction", Annals of Statistics 41 (2013): 250--268
• W. J. Runggaldier and F. Spizzichino, "Sufficient conditions for finite dimensionality of filters in discrete time: A Laplace transform-based approach," Bernoulli 7 (2001): 211--221
• Morris Skibinsky, "Adequate Subfields and Sufficiency", Annals of Mathematical Statistics 38 (1967): 155--161
• Taiji Suzuki and Masashi Sugiyama, "Sufficient Dimension Reduction via Squared-Loss Mutual Information Estimation", Neural Computation 25 (2013): 725--758
• Andrew Tausz, "Properties of Conditional Expectation Operators and Sufficient Subfields", arxiv:1011.5162
• Brendan van Rooyen, Robert C. Williamson, "Le Cam meets LeCun: Deficiency and Generic Feature Learning", arxiv:1402.4884
• Tao Wang, Xu Guo, Peirong Xu, Lixing Zhu, "Transformed sufficient dimension reduction", arxiv:1401.0267
• Makoto Yamada, Gang Niu, Jun Takagi, Masashi Sugiyama, "Sufficient Component Analysis for Supervised Dimension Reduction", arxiv:1103.4998
• S. Zacks, The Theory of Statistical Inference [For material on necessary and sufficient statistics]