Notebooks

## Regression, especially Nonparametric Regression

25 Jul 2022 13:16

"Regression", in statistical jargon, is the problem of guessing the average level of some quantitative response variable from various predictor variables.

Linear regression is perhaps the single most common quantitative tool in economics, sociology, and many other fields; it's certainly the most common use of statistics. (Analysis of variance, arguably more common in psychology and biology, is a disguised form of regression.) While linear regression deserves a place in statistics, that place should be nowhere near as large and prominent as it currently is. There are very few situations where we actually have scientific support for linear models. Fortunately, very flexible nonlinear regression methods now exist, and from the user's point of view are just as easy as linear regression, and at least as insightful. (Regression trees and additive models, in particular, are just as interpretable.) At the very least, if you do have a particular functional form in mind for the regression, linear or otherwise, you should use a non-parametric regression to test the adequacy of that form.

From a technical point of view, the main drawback of modern regression methods is that their extra flexibility comes at the price of less "efficiency" — estimates converge more slowly, so you have less precision for the same amount of data. There are some situations where you'd prefer to have more precise estimates from a bad model than less precise estimates from a model which doesn't make systematic errors, but I don't think that's what most users of linear regression are chosing to do; they're just taught to type lm rather than gam. In this day and age, though, I don't understand why not.

(Of course, for the statistician, a lot of the more flexible regression methods look more or less like linear regression in some disguised form, because fundamentally all it does is project on to a function basis. So it's not crazy to make it a foundational topic for statisticians. We should not, however, give the rest of the world the impression that the hat matrix is the source of all knowledge.)

The use of regression, linear or otherwise, for causal inference, rather than prediction, is a different, and far more sordid, story.

Recommended, more specialized:
• Azadeh Alimadad and Matias Salibian-Barrera, "An Outlier-Robust Fit for Generalized Additive Models with Applications to Disease Outbreak Detection", Journal of the American Statistical Association 106 (2011): 719--731
• Norman H. Anderson and James Shanteau, "Weak inference with linear models", Psychological Bulletin 84 (1977): 1155--1170 [A demonstration of why you should not rely on $R^2$ to back up your claims]
• Mikhail Belkin, Partha Niyogi, Vikas Sindhwani, "Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples", Journal of Machine Learning Research 7 (2006): 2399--2434
• Peter J. Bickel and Bo Li, "Local polynomial regression on unknown manifolds", pp. 177--186 in Regina Liu, William Strawderman and Cun-Hui Zhang (eds.), Complex Datasets and Inverse Problems: Tomography, Networks and Beyond (2007) ["`naive' multivariate local polynomial regression can adapt to local smooth lower dimensional structure in the sense that it achieves the optimal convergence rate for nonparametric estimation of regression functions ... when the predictor variables live on or close to a lower dimensional manifold"]
• Michael H. Birnbaum, "The Devil Rides Again: Correlation as an Index of Fit", Psychological Bulletin 79 (1973): 239--242
• Lawrence D. Brown and Mark G. Low, "Asymptotic Equivalence of Nonparametric Regression and White Noise", Annals of Statistics 24 (1996): 2384--2398 [JSTOR]
• Peter Bühlmann, M. Kalisch and M. H. Maathuis, "Variable selection in high-dimensional linear models: partially faithful distributions and the PC-simple algorithm", Biometrika 97 (2010): 261--278
• Peter Bühlmann and Sara van de Geer, Statistics for High-Dimensional Data: Methods, Theory and Applications [State-of-the art (2011) compendium of what's known about using high-dimensional regression, especially but not just the Lasso.]
• A. Buja, R. Berk, L. Brown, E. George, E. Pitkin, M. Traskin, K. Zhan, L. Zhao, "Models as Approximations: How Random Predictors and Model Violations Invalidate Classical Inference in Regression", arxiv:1404.1578
• Andreas Buja, Trevor Hastie and Robert Tibshirani, "Linear smoothers and additive models", Annals of Statistics 17 (1989): 453--510 [A classic additive models paper. The discussions and reply fill pp. 510--555.]
• Raymond J. Carroll, Aurore Delaigle, and Peter Hall, "Nonparametric Prediction in Measurement Error Models", Journal of the American Statistical Association 104 (2009): 993--1003
• Raymond J. Carroll, J. D. Maca and D. Ruppert, "Nonparametric regression in the presence of measurement error", Biometrika 86 (1999): 541--554
• Kevin A. Clarke, "The Phantom Menace: Omitted Variables Bias in Econometric Research" [PDF. Or: Kitchen-sink regressions considered harmful. Including extra variables in your linear regression may or may not reduce the bias in your estimate of any particular coefficients of interest, depending on the correlations between the added variables, the predictors of interest, the response, and omitted relevant variables. Adding more variables always increases the variance of your estimates.]
• Eduardo Corona, Terran Lane, Curtis Storlie, Joshua Neil, "Using Laplacian Methods, RKHS Smoothing Splines and Bayesian Estimation as a framework for Regression on Graph and Graph Related Domains" [Technical report, University of New Mexico Computer Science, 2008-06, PDF]
• William H. DuMouchel and Greg J. Duncan, "Using Sample Survey Weights in Multiple Regression Analysis of Stratified Samples", Proceedings of the Survey Research Methods Section, American Statistical Association (1981), pp. 629--637 [PDF reprint; presumably very similar to "Using Sample Survey Weights to Compare Various Linear Regression Models", Journal of the American Statistical Association 78 (1983): 535--543, but I have not looked at the latter]
• Andrew Gelman and Iain Pardoe, "Average predictive comparisons for models with nonlinearity, interactions, and variance components", Sociological Methodology forthcoming (2007) [PDF preprint, Gelman's comments]
• Lee-Ad Gottlieb, Aryeh Kontorovich, Robert Krauthgamer, "Efficient Regression in Metric Spaces via Approximate Lipschitz Extension", arxiv:1111.4470
• Lászlo Györfi, Michael Kohler, Adam Krzyzak and Harro Walk, A Distribution-Free Theory of Nonparametric Regression
• Berthold R. Haag, "Non-parametric Regression Tests Using Dimension Reduction Techniques", Scandinavian Journal of Statistics 35 (2008): 719--738
• Peter Hall, "On Bootstrap Confidence Intervals in Nonparametric Regression", Annals of Statistics 20 (1992): 695--711
• Peter Hall and Joel Horowitz, "A simple bootstrap method for constructing nonparametric confidence bands for functions", Annals of Statistics 41 (2013): 1892--1921, arxiv:1309.4864
• Jeffrey D. Hart, Nonparametric Smoothing and Lack-of-Fit Tests
• Yongmiao Hong and Halbert White, "Consistent Specification Testing Via Nonparametric Series Regression", Econometrica 63 (1995): 1133--1159 [JSTOR]
• Adel Javanmard, Andrea Montanari, "Confidence Intervals and Hypothesis Testing for High-Dimensional Regression", arxiv:1306.3171
• M. Kohler, A. Krzyzak and D. Schafer, "Application of structural risk minimization to multivariate smoothing spline regression estimates", Bernoulli 8 (2002): 475--490
• Alexander Korostelev, "A minimaxity criterion in nonparametric regression based on large-deviations probabilities", Annals of Statistics 24 (1996): 1075--1083
• Jon Lafferty and Larry Wasserman [To be honest, I haven't checked to see how different these two papers actually are...]
• "Rodeo: Sparse Nonparametric Regression in High Dimensions", math.ST/0506342
• "Rodeo: Sparse, greedy nonparametric regression", Annals of Statistics 36 (2008): 27--63, arxiv:0803.1709
• Diane Lambert and Kathryn Roeder, "Overdispersion Diagnostics for Generalized Linear Models", Journal of the American Statistical Association 90 (1995): 1225--1236 [JSTOR]
• Lukas Meier, Sara van de Geer and Peter Bühlmann, "High-Dimensional Additive Modeling", Annals of Statistics 37 (2009): 3779--3821, arxiv:0806.4115
• Abdelkader Mokkadem, Mariane Pelletier, Yousri Slaoui, "Revisiting Révész's stochastic approximation method for the estimation of a regression function", arxiv:0812.3973
• Patrick O. Perry, "Fast Moment-Based Estimation for Hierarchical Models", arxiv:1504.04941
• Garvesh Raskutti, Martin J. Wainwright, and Bin Yu, "Early stopping and non-parametric regression: An optimal and data-dependent stopping rule", arxiv:1306.3574
• Pradeep Ravikumar, John Lafferty, Han Liu, Larry Wasserman, "Sparse Additive Models", arxiv:0711.4555 [a.k.a. "SpAM"]
• B. W. Silverman, "Spline Smoothing: The Equivalent Variable Kernel Method", Annals of Statistics 12 (1984): 898--916
• Ryan J. Tibshirani, "Degrees of Freedom and Model Search", arxiv:1402.1920
• Gerhard Tutz, Regression for Categorical Data
• Sara van de Geer, Empirical Process Theory in M-Estimation
• Grace Wahba, Spline Models for Observational Data
• Jianming Ye, "On Measuring and Correcting the Effects of Data Mining and Model Selection", Journal of the American Statistical Association 93 (1998): 120--131