Notebooks

## Statistics

04 May 2023 11:53

An application of probability, with intimate ties to machine learning, non-demonstrative inference and induction.

Since June 2005, I have been a professor of statistics. This made me interested in how to teach it.

Dependent data
Statistical inference for stochastic processes, a.k.a. time-series analysis. Signal processing and filtering. Spatial statistics. Spatio-temporal statistics.
Model selection
Especially: adapting to unknown characteristics of the data, like unknown noise distributions, or unknown smoothness of the regression function.
Model discrimination
That is, designing experiments so as to discriminate between competing classes of model. Adaptation to data issues here.
Rates of convergence of estimators to true values
Empirical process theory. (Cf. some questions in ergodic theory).
Estimating distribution functions
And estimating entropies, or other functionals of distributions.
Non-parametric methods
Both those that are genuinely distribution-free, and those that would more accurately be mega-parametric (even infinitely-parametric) methods, such as neural networks
Regression
Bootstrapping and other resampling methods
Cross-validation
Sufficient statistics
Exponential families
Information Geometry
Partial identification of parametric statistical models
Causal Inference
Decision theory
Conventional, and the sorts with some connection to how real decisions are made.
Graphical models
Monte Carlo and other simulation methods
"De-Bayesing"
Ways of taking Bayesian procedures and eliminating dependence on priors, either by replacing them by initial point-estimates, or by showing the prior doesn't matter, asymptotically or hopefully sooner. See: Frequentist consistency of Bayesian procedures.
Computational Statistics
Statistics of structured data
Statistics on manifolds
i.e., what to do when the data live in a continuous but non-Euclidean space.
Grammatical Inference
Factor analysis
Mixture models
Multiple testing
Predictive distributions
... especially if they have confidence/coverage properties
Density estimation
especially conditional density estimation; and density estimation on graphical models
Indirect inference
"Missing mass" and species abundance problems
I.e., how much of the distribution have we not yet seen?
Independence Tests, Conditional Independence Tests, Measures of Dependence and Conditional Dependence
Two-Sample Tests
Statistical Emulators for Simulation Models
Hilbert Space Methods for Statistics and Probability
Large Deviations and Information Theory in the Foundations of Statistics Confidence Sets, Confidence Intervals
Nonparametric Confidence Sets for Functions
Conformal prediction
Recommended, non-technical:
• Jordan Ellenberg, How Not to Be Wrong: The Power of Mathematical Thinking
• Francis Galton, "Statistical Inquiries into the Efficacy of Prayer," Fortnightly Review 12 (1872): 125--135 [online]
• Larry Gonick and Woollcott Smith, The Cartoon Guide to Statistics
• Ian Hacking, The Taming of Chance
• D. Huff, How to Lie with Statistics
• Theodore Porter, The Rise of Statistical Thinking, 1820--1900
• Constance Reid, Neyman from Life [Biography of Jerzy Neyman, one of the makers of modern statistical theory, and, I am happy to say, among the brighter lights of my alma mater. Reid does an excellent job of explaining Neyman's work in terms accessible to the general reader. There is a new edition, titled simply Neyman, but otherwise unchanged. Review by Steve Laniel]
• Edward R. Tufte
• The Visual Display of Quantitative Information
• Visual Explanations
Recommended, technical, close-ups:
• A. C. Atkinson and A. N. Donev, Optimum Experimental Design
• F. Bacchus, H. E. Kyburg and M. Thalos, "Against Conditionalization," Synthese 85 (1990): 475--506 [Why "Dutch book" arguments do not, in fact, mean that rational agents must be Bayesian reasoners]
• M. J. Bayarri and James O. Berger, "$P$ Values for Composite Null Models", Journal of the American Statistical Association 95 (2000) 127--1142 [To be read in conjunction with Robins, van der Vaart and Ventura, below. JSTOR]
• Anil K. Bera and Aurobindo Ghosh, "Neyman's Smooth Test and Its Applications in Econometrics", pp. 177--230 in Aman Ullah, Alan T. K. Wan and Anoop Chaturvedi (eds.), Handbook of Applied Econometrics and Statistical Inference, SSRN/272888
• Julian Besag, "A Candidate's Formula: A Curious Result in Bayesian Prediction", Biometrika 76 (1989): 183 [A wonderful and bizarre expression for the Bayesian predictive density, in terms of how adding a new data point would change the posterior. JSTOR]
• P. J. Bickel, "On Adaptive Estimation", Annals of Statistics 10 (1982): 647--671
• Pier Bissiri, Chris Holmes, Stephen Walker, "A General Framework for Updating Belief Distributions", arxiv:1306.6430
• David Blackwell and M. A. Girshick, Theory of Games and Statistical Decisions
• Leo Breiman, "No Bayesians in Foxholes", IEEE Expert: Intelligent Systems and Their Applications 12 (1997): 21--24 [PDF reprint; comments by Andy Gelman]
• Peter Bühlmann and Sara van de Geer, Statistics for High-Dimensional Data: Methods, Theory and Applications
• Ronald W. Butler, "Predictive Likelihood Inference with Applications", Journal of the Royal Statistical Society B 48 (1986): 1--38 ["in the predictive setting, all parameters are nuisance parameters". JSTOR]
• Venkat Chandrasekaran and Michael I. Jordan, "Computational and Statistical Tradeoffs via Convex Relaxation", Proceedings of the National Academy of Sciences (USA) 110 (2013): E1181--E1190, arxiv:1211.1073
• Hwan-sik Choi and Nicholas M. Kiefer, "Differential Geometry and Bias Correction in Nonnested Hypothesis Testing" [PDF preprint via Kiefer]
• Aurore Delaigle and Peter Hall, "Higher Criticism in the Context of Unknown Distribution, Non-Independence and Classification", pp. 109--138 of Sastry, Rao, Delampady and Rajeev (eds.), Platinum Jubilee Proceedings of the Indian Statistical Institute [PDF reprint via Prof. Delaigle]
• J. Bradford DeLong and Kevin Lang, "Are All Economic Hypotheses False?", Journal of Political Economy 100 (1992): 1257--1272 [PDF preprint. The point is about abuses of hypothesis testing, not economic hypotheses as such.]
• Amir Dembo and Yuval Peres, "A Topological Criterion for Hypothesis Testing", Annals of Statistics 22 (1994): 106--117 ["A simple topological criterion is given for the existence of a sequence of tests for composite hypothesis testing problems, such that almost surely only finitely many errors are made."]
• David Donoho and Jiashun Jin, "Higher criticism for detecting sparse heterogeneous mixtures", Annals of Statistics 32 (2004); 962--994, arxiv:math.ST/0410072
• Earman, Bayes or Bust? A Critical Account of Bayesian Confirmation Theory
• Mikhail Ermakov, "On Consistent Hypothesis Testing", arxiv:1403.6296
• Michael Evans, "What does the proof of Birnbaum's theorem prove?", arxiv:1302.5468
• S. N. Evans and P. B. Stark, "Inverse Problems as Statistics" [Abstract, PDF]
• Steve Fienberg, The Analysis of Cross-Classified Categorical Data
• Andrew Gelman, Jennifer Hill and Masanao Yajima, "Why we (usually) don't have to worry about multiple comparisons" [PDF preprint]
• Andrew Gelman and Iain Pardoe, "Average predictive comparisons for models with nonlinearity, interactions, and variance components", Sociological Methodology 37 (2007): 23--51 [PDF preprint, Gelman's comments]
• Christopher Genovese, Peter Freeman, Larry Wasserman, Robert C. Nichol and Christopher Miller, "Inference for the Dark Energy Equation of State Using Type IA Supernova Data", Annals of Applied Statistics 3 (2009): 144--178, arxiv:0805.4136 [I am biased, because Genovese and Wasserman are friends, but this seems to me a model of a modern applied statistics paper: use interesting statistical ideas to say something helpful about an important scientific problem on its own terms, rather than distorting the problem until it "looks like a nail".]
• Charles J. Geyer, "Le Cam Made Simple: Asymptotics of Maximum Likelihood without the LLN or CLT or Sample Size Going to Infinity", arxiv:1206.4762 [There are two separable points here. One is that much of the usual asymptotic theory of maximum likelihood follows from the quadratic form of the likelihood alone; whenever and however that is reached, those consequences follow. Approximately quadratic likelihoods imply approximations to the usual asymptotics. This is unquestionably correct. The other is some bashing of results like the law of large numbers and central limit theorem, which seems misguided to me.]
• Tilmann Gneiting, "Making and Evaluating Point Forecasts", Journal of the American Statistical Association 106 (2011): 746--762, arxiv:0912.0902
• Tilmann Gneting, Fadoua Balabdaoui and Adrian E. Raftery, "Probabilistic Forecasts, Calibration and Sharpness", Journal of the Royal Statistical Society B 69 (2007): 243--268
• Mark S. Handcock and Martina Morris, Relative Distribution Methods in the Social Sciences
• Bruce E. Hansen
• "The Likelihood Ratio Test Under Nonstandard Conditions: Testing the Markov Switching Model of GNP", Journal of Applied Econometrics 7 (1992): S61--S82 [I very much like the approach of treating the likelihood ratio as an empirical process; why haven't I seen it before? (Also, the state-of-the-art in simulating Gaussian processes must be much better now than what Hansen had in '92, which would make this even more practical. PDF reprint.]
• "Inference when a nuisance parameter is not identified under the null hypothesis", Econometrica 64 (1996): 413--430
• Jeffrey D. Hart, Nonparametric Smoothing and Lack-of-Fit Tests
• Christopher C. Heyde, Quasi-Likelihood and Its Applications: A General Approach to Optimal Parameter Estimation
• Kieran Healy, Data Visualization: A Practical Introduction
• Nils Lid Hjort and David Pollard, "Asymptotics for minimisers of convex processes", arxiv:1107.3806 [Very elegant]
• Peter J. Huber
• Wilbert C. M. Kallenberg and Teresa Ledwina, "Data-driven smooth tests when the hypothesis is composite", Journal of the American Statistical Association 92 (1997): 1094--1104 [Abstract, PDF reprint; JSTOR]
• Gary King, A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data
• Gary King and Margaret Roberts, "How Robust Standard Errors Expose Methodological Problems They Do Not Fix" [PDF preprint]
• Evelyn M. Kitagawa, "Components of a Difference Between Two Rates", Journal of the American Statistical Association 50 (1955): 1168--1194
• Solomon W. Kullback, Information Theory and Statistics
• Michael Lavine and Mark J. Schervish, "Bayes Factors: What They Are and What They Are Not" [PS preprint]
• Steffen Lauritzen, Extremal Families and Systems of Sufficient Statistics [See comments under sufficient statistics]
• J. F. Lawless and Marc Fredette, "Frequentist prediction intervals and predictive distributions", Biometrika 92 (2005): 529--542 ["Frequentist predictive distributions are defined as confidence distributions .... A simple pivotal-based approach that produces prediction intervals and predictive distributions with well-calibrated frequentist probability interpretations is introduced, and efficient simulation methods for producing predictive distributions are considered. Properties related to an average Kullback-Leibler measure of goodness for predictive or estimated distributions are given."]
• Lucien Le Cam
• "Neyman and Stochastic Models" [PDF. Some vignettes of Neyman putting together models, and his model-building process.]
• "Maximum Likelihood; An Introduction" [PDF. Not an introduction, but rather a collection of examples of where it just does not work, or at least doesn't work well. That this is presented as "an introduction" is entirely characteristic of the author.]
• Erich L. Lehmann, "On likelihood ratio tests", math.ST/0610835
• Bing Li, "A minimax approach to consistency and efficiency for estimating equations," Annals of Statistics 24 (1996): 1283--1297
• Bruce Lindsay and Liawei Liu, "Model Assessment Tools for a Model False World", Statistical Science 24 (2009): 303--318, arxiv:1010.0304 [Their model-adequacy index is, essentially, the number of samples needed to detect the falsity of the model with some reasonable, pre-set level of power, with fixed size/significance level. This is a very natural quantity. In fact, by results which go back to Kullback's book, the power grows exponentially, with a rate equal to the Kullback-Leibler divergence rate. (More exactly, one minus the power goes to zero exponentially at that rate, but you know what I meant.) Large deviations theory includes generalizations of this result. Many statisticians, I'd guess, would prefer the Lindsay-Liu index because will feel it more natural to them to gauge error in terms of a sample size rather than bits, but to each their own.]
• Brad Luen and Philip B. Stark, "Testing earthquake predictions", pp. 302--315 in Deborah Nolan and Terry Speed (eds.), Probability and Statistics; Essays in Honor of David A. Freedman [The issues arise however not just for earthquakes, but for all sorts of clustered events]
• Charles Manski, Identification for Prediction and Decision
• Deborah G. Mayo and D. R. Cox, "Frequentist statistics as a theory of inductive inference", math.ST/0610846
• Karthika Mohan, Judea Pearl and Jin Tian, "Graphical Models for Inference with Missing Data", NIPS 2013 [There was at least one preprint version with the more pointed title "Missing Data as a Causal Inference Problem"]
• M. B. Nevel'son and R. Z. Has'minskii, Stochastic Approximation and Recursive Estimation
• Andrey Novikov, "Optimal sequential multiple hypothesis tests", arxiv:0811.1297
• David Pollard
• "Asymptotics via Empirical Processes", Statistical Science 4 (1989): 341--354
• Empirical Processes: Theory and Applications
• Jeffrey S. Racine, "Nonparametric Econometrics: A Primer", Foundations and Trends in Econometrics 3 (2008): 1--88 [Good primer of nonparametric techniques for regression, density estimation and hypothesis testing; next to no economic content (except for examples). Presumes reasonable familiarity with parametric statistics. PDF reprint]
• J. N. K. Rao, "Some recent advances in model-based small area estimation", Survey Methodology 25 (1999): 175--186
• James M. Robins and Ya'acov Ritov, "Toward a curse of Dimensionality Appropriate (CODA) Asymptotic Theory for Semi-Parametric Models", Statistics in Medicine 16 (1997): 285--319 [PDF reprint via Prof. Robins]
• James M. Robins, Aad van de Vaart and Valérie Ventura, "Asymptotic Distribution of P Values in Composite Null Models", Journal of the American Statistical Association 95 (2000): 1143--1156 [JSTOR. Paired article with Bayarri and Berger, above. The discussions and rejoinders (pp. 1157--1172) are valuable.]
• George G. Roussas, Contiguity of Probability Measures: Some Applications in Statistics
• C. Scott and R. Nowak, "A Neyman-Pearson Approach to Statistical Learning", IEEE Transactions on Information Theory 51 (2005): 3806--3819 [Comments: Learning Your Way to Maximum Power]
• Steven G. Self and Kung-Yee Liang, "Asymptotic Properties of Maximum Likelihood Estimators and Likelihood Ratio Tests Under Nonstandard Conditions", Journal of the American Statistical Association 82 (1987): 605--610 [JSTOR]
• Tom Shively, Stephen Walker, "On the Equivalence between Bayesian and Classical Hypothesis Testing", arxiv:1312.0302
• Jeffrey S. Simonoff, Smoothing Methods in Statistics
• Spyros Skouras, "Decisionmetrics: Towards a Decision-Based Approach to Econometrics," SFI Working Paper 2001-11-064 [Applies far outside econometrics. If what you really want to do is to minimize a known loss function, optimizing a conventional accuracy measure, e.g. least squares, can be highly counterproductive.]
• Aris Spanos
• "The Curve-Fitting Problem, Akaike-type Model Selection, and the Error Statistical Approach" [Or: could your model selection tell you that Kepler is better than Ptolemy? Technical report, economics dept., Virginia Tech, 2006. PDF]
• "Where do statistical models come from? Revisiting the problem of specification", math.ST/0610849
• Yun Ju Sung, Charles J. Geyer, "Monte Carlo likelihood inference for missing data models", Annals of Statistics 35 (2007): 990--1011, arxiv:0708.2184
• F. V. Tkachov [Comments]
• "Approaching the Parameter Estimation Quality of Maximum Likelihood via Generalized Moments", arxiv:physics/0001019
• "Quasi-optimal observables: Attaining the quality of maximal likelihood in parameter estimation when only a MC event generator is available," arxiv:physics/0108030
• Alexandre B. Tsybakov, Introduction to Nonparametric Estimation
• Sara van de Geer, Empirical Process Theory in M-Estimation [Finding non-asymptotic rates of convergence for common estimators]
• Quang H. Vuong, "Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses", Econometrica 57 (1989): 307--333
• Grace Wahba, Spline Models for Observational Data
• Michael E. Wall, Andreas Rechtsteiner and Luis M. Rocha, "Singular Value Decomposition and Principal Component Analysis," physics/0208101
• Michael D. Ward, Brian D. Greenhill and Kristin M. Bakke, "The perils of policy by p-value: Predicting civil conflicts", Journal of Peace Research 47 (2010): 363--375
• Larry Wasserman, "Low Assumptions, High Dimensions", RMM 2 (2011): 201--209
• Halbert White, Estimation, Inference and Specification Analysis
• Achilleas Zapranis and Apostolos-Paul Refenes, Principles of Neural Model Identification, Selection and Adequacy, with Applications to Financial Econometrics
• Sven Zenker, Jonathan Rubin, Gilles Clermont, "From Inverse Problems in Mathematical Physiology to Quantitative Differential Diagnoses", PLoS Computational Biology 3 (2007): e205
• Johanna F. Ziegel and Tilmann Gneiting, "Copula Calibration", arxiv:1307.7650
Recommended, technical, historical interest:
• Trygve Haavelmo, "The Probability Approach in Econometrics", Econometrica 12 (1944, supplement): iii--115 [JSTOR]
• Jerzy Neyman, "On the Two Different Aspects of the Representative Method: The Method of Stratified Sampling and the Method of Purposive Selection", Journal of the Royal Statistical Society 97 (1934): 558--625 [This is an astonishing paper on multiple levels. One is the thoroughness with which it achieves its main objective, of demonstrating the superiority of random sampling over alternatives. Another is that it seems to be the first conscious use of confidence intervals. Yet another is the way it set the pattern for a huge fraction of all subsequent statistics down to the present.]
• Henry Scheffe, "Statistical Inference in the Non-Parametric Case", Annals of Mathematical Statistics 14 (1943): 305--332 [Recommended not as a historical study, but a historical document]
• Abraham Wald, "Estimation of a Parameter When the Number of Unknown Parameters Increases Indefinitely with the Number of Observations", Annals of Mathematical Statistics 19 (1948): 220--227
Modesty forbids me to recommend:
• Andrew Gelman and CRS, "Philosophy and the practice of Bayesian statistics", submitted to the Journal of the American Statistical Association, arxiv:1006.3868
• CRS
Not unambiguously recommended:
• Peter J. Diggle and Amanda G. Chetwynd, Statistics and Scientific Method: An Introduction for Students and Researchers [A missed opportunity.]
• Peter McCullagh, "What is a statistical model?", Annals of Statistics 30 (2002): 1225--1310 [I'm not sure what to think about this; some of the ideas about requiring invariance (or equivariance) under transformations make sense, but I don't know that they lead to anything positive, or need such arcane category-theoretic expression. We should however have cited this in our paper on projectibility and consistency under sampling. (I blame our referees for not making the connection.) --- The discussion and rejoinder are worth reading. Kalman's contribution is very special.]