January 31, 2013

Lecture: The Bootstrap (Advanced Data Analysis from an Elementary Point of View)

The sampling distribution is the source of all knowledge regarding statistical uncertainty. Unfortunately, the true sampling distribution is inaccessible, since it is a function of exactly the quantities we are trying to infer. One exit from this vicious circle is the bootstrap principle: approximate the true sampling distribution by simulating from a good model of the process, and treating the simulation data just like the data. The simplest form of this is parametric bootstrapping, i.e., simulating from the fitted model. Nonparametric bootstrapping means simulating by re-sampling, i.e., by treating the observed sample as a complete population and drawing new samples from it. Bootstrapped standard errors, biases, confidence intervals, p-values. Tricks for making the simulated distribution closer to the true sampling distribution (pivotal intervals, studentized intervals, the double bootstrap). Bootstrapping regression models: by parametric bootstrapping; by resampling residuals; by resampling cases. Many, many examples. When does the bootstrap fail?

Note: Thanks to Prof. Christopher Genovese for delivering this lecture while I was enjoying the hospitality of the fen-folk.

Reading: Notes, chapter 6 (R for figures and examples; pareto.R; wealth.dat);
Lecture slides; R for in-class examples
Cox and Donnelly, chapter 8

Advanced Data Analysis from an Elementary Point of View

Posted at January 31, 2013 10:30 | permanent link

Three-Toed Sloth