Indirect Inference

17 Nov 2021 21:33

A technique of parameter estimation for simulation models. You go and build a stochastic generative model of your favorite process or assemblage, and, being a careful scientist, you do a conscientious job of trying to include what you guess are all the most important mechanisms. The result is something you can step through to produce a simulation of the process of interest. But your model contains some unknown parameters, let's say generically \( \theta \), and you would like to tune those to match the data — or see if, despite your best efforts, there are aspects of the data which your model just can't match.

Very often, you will find that your model is too complicated for you to appeal to any of the usual estimation methods of statistics. Because you've been aiming for scientific adequacy rather than statistical tractability, it will often happen that there is no way to even calculate the likelihood of a given data set \( x_1, x_2, \ldots x_t \equiv x_1^t \) under parameters \( \theta \) in closed form, which would rule out even numerical likelihood maximization, to say nothing of Bayesian methods, should you be into them. (For concreteness, I am writing as though the data were just a time series, possibly vector-valued, but the ideas adapt in the obvious way to spatial processes or more complicated formats.) Yet you can simulate; it seems like there should be some way of saying whether the simulations look like the data.

This is where indirect inference comes in, with what I think is a really brilliant idea. Introduce a new model, called the "auxiliary model", which is mis-specified and typically not even generative, but is easily fit to the data, and to the data alone. (By that last I mean that you don't have to impute values for latent variables, etc., etc., even though you might know those variables exist and are causally important.) The auxiliary model has its own parameter vector \( \beta \), with an estimator \( \hat{\beta} \). These parameters describe aspects of the distribution of observables, and the idea of indirect inference is that we can estimate the generative parameters \( \theta \) by trying to match those aspects of observations, by trying to match the auxiliary parameters.

On the one side, start with the data \( x_1^t \) and get auxiliary parameter estimates \( \hat{\beta}(x_1^t) \equiv \hat{\beta}_t \). On the other side, for each \( \theta \) we can generate a simulated realization \( \tilde{X}_1^t(\theta) \) of the same size (and shape, if applicable) as the data, leading to auxiliary estimates \( \hat{\beta}(\tilde{X}_1^t(\theta)) \equiv \tilde{\beta}_t(\theta) \). The indirect inference estimate \( \hat{\theta} \) is the value of \( \theta \) where \( \tilde{\beta}_t(\theta) \) comes closest to \( \hat{\beta}_t \). More generally, we can introduce a (symmetric, positive-definite) matrix \( \mathbf{W} \) and minimize the quadratic form \[ \left(\hat{\beta}_t - \tilde{\beta}_t(\theta)\right) \cdot \mathbf{W} \left(\hat{\beta}_t - \tilde{\beta}_t(\theta)\right) \] with the entries in the matrix chosen to give more or less relative weight to the different auxiliary parameters.

The remarkable thing about this is that it works, in the sense of giving consistent parameter estimates, under not too strong conditions. Suppose that the data really are generated under some parameter value \( \theta_0 \); we'd like to see \( \hat{\theta} \rightarrow \theta_0 \). (Estimating the pseudo-truth in a mis-specified model works similarly but is more complicated than I feel like going into right now.) Sufficient conditions for this are that

  1. the auxiliary estimates converge to a non-random "binding function" \[ \tilde{\beta}_t(\theta) \rightarrow b(\theta) \] uniformly in \( \theta \), and
  2. the binding function \( b(\theta) \) is invertible.
(Really, both properties just need to hold in some suitable domain \( \Theta \) which includes \( \theta_0 \).)

Basically, these mean that the set of auxiliary parameters have to be rich enough to characterize or distinguish the different values of the generative parameters, and we need to be able to consistently estimate the former. This means we need at least as many auxiliary parameters as generative ones, so auxiliary models tend to be ones where it's easy to keep loading on parameters. (Adding too many auxiliary parameters does lead to loss of efficiency, however.) If \( b(\theta) \) is also differentiable in \( \theta \), and some additional regularity conditions hold, then we even get asymptotic Gaussian errors, with the matrix of partial derivatives \( \partial \beta_i/\partial \theta_j \) playing a role like the Fisher information matrix. — I can't resist adding that the usual conditions quoted for the consistency of indirect inference are stronger, and that these come from a chapter in the dissertation of my student Linqiao Zhao.

I think this is a really, really powerful idea, and one which should be much more widely adopted by people working with simulation models. In particular, one of my Cunning Plans is to make it work for agent-based modeling, and especially for models of social network formation.

A topic of particular interest to me is how to use non-parametric estimators, of regression or density curves say, as the auxiliary models, since then there is never any problem of having too few auxiliary parameters (though they might still be insensitive to the generative parameters, if one is looking the wrong curves. Nickl and Pötscher, below, have some initial results in this direction.

("Approximate Bayesian computation" is a very similar idea, but where the plain truth of the evidence is corrupted by prejudice a prior distribution is used to stabilize estimates, at some cost in sensitivity. I need to learn more about it.)

(I wrote the first version of this sometime before 19 September 2010...)

Previous versions: 2010-09-19 21:17