### Simulation IV: Quantifying Uncertainty with Simulations (Introduction to Statistical Computing)

(My notes for this lecture are too fragmentary to write up properly; here's
the sketch.)

Two forms of statistical uncertainty: (I) How much would our answers change
if the data were different? (II) How diverse are the answers which don't make
use hate ourselves or our data?

For (I), the main issue is the sampling distribution: what distribution of
answers would our procedures deliver if we re-ran the experiment many times?
Since we can't re-run the experiment, a simulation method of approximating the
sampling distribution is the bootstrap. This relies on
probabilistic assumptions about how the data were generated, and so about which
simulation to run.

For (II), the main issue is that if we get really weird answers, we're
reluctant to accept them, but sometimes the data force us to give up our
pre-conceptions. Bayesian inference is a way of trying to formalize this, by
introducing biases that favor some parts of the parameter space over others.
In fact, we try using lots and lots of different parameter values, but with
different weights. Bayesian updating is reinforcement
learning/evolutionary search with a fitness function proportional to the
likelihood. The Bayesian posterior is the population of parameter values
which have survived our selective breeding. Rather than actually calculating
the posterior, though, we usually use Markov chain Monte
Carlo and get a (dependent) sample from the posterior distribution. N.B.,
the Markov chain for the MCMC is *not* a model of the original process,
which we're generally not simulating.

WARNING: Bayesian uncertainty will generally *not* match the
uncertainties we'd get from repeated sampling. Intervals that hold 95% of the
posterior weight might include the true parameter value only 5% of the time, or
even only 0% of the time. (See, among
others: Wasserman,
Fraser.)

Introduction to Statistical Computing

Posted at October 23, 2013 10:30 | permanent link