Partial Identification of Parametric Statistical Models30 Mar 2023 10:35
A parametric statistical model is said to be "identifiable" if no two parameter settings give rise to the same distribution of observations. This means that there is always some way to test whether the parameters take one value rather than another. If this is not the case, then the model is said to be "unidentifiable". Sometimes models are unidentifiable because they are bad models, specified in a stupid way which leads to redundancies. Sometimes, however, models are unidentifiable because the data are bad --- if you could measure certain variables, or measure them more precisely, the model would be identifiable, but in fact you have to put up with noisy, missing, aggregated, etc. data. (Technically: the information we get from observations is represented by a sigma algebra, or, over time, a filtration. If two distributions differ on the full filtration, their restrictions to some smaller filtration might coincide.) Presumably then you could still partially identify the model, up to, say, some notion of observational equivalence. Query: how to make this precise?
If the distribution predicted by the model depends, in a reasonably smooth way, on the parameters, then we can form the Fisher information matrix, which is basically the matrix of expected second derivatives of the likelihood with respect to all the parameters. (I realize that's not a very helpful statement if you haven't at least forgotten the real definition of the Fisher information.) Suppose one or more of the eigenvalues of the Fisher information matrix is zero. Any vector orthogonal to the span of the eigenvectors corresponding to the non-zero eigenvalues then gives a linear combination of the original parameters which is unidentifiable, at least in the vicinity of the point at which you're taking derivatives. This suggests at least two avenues of approach here.
- Re-parameterization. Perform some kind of rotation of the coordinate system in parameter space so that one has a clear distinction between identifiable and non-identifiable parameters into two orthogonal groupings, and inference for the former can proceed in total ignorance of the values of the latter.
- Equivalent models. Say (as I implied above) that two parameter settings at observationally equivalent if they yield the same distribution over observations. The set of all parameter values observationally equivalent to a given value should, plausibly, form a sub-manifold of parameter space. The zero eigenvectors of the Fisher information matrix give the directions in which we could move in parameter space and (locally) stay on this sub-manifold. Is this enough to actually define that sub-manifold? It sounds plausible. (I am thinking of how, in dynamical systems, we can go from knowing the stable/unstable/neutral directions in the neighborhood of a fixed point to the stable/unstable/neutral manifolds, extending, potentially, arbitrarily far away.) Could this actually be used, in an exercise in computational differential geometry, to calculate the sub-manifold?
--- Since writing the first version of this, I've run across the work of Charles Manski, which is centrally concerned with "partial identification", but not quite in the sense I had in mind. Rather than reparameterizing to get some totally identifiable parameters and ignore the rest, he wants to take the parameters as given, and put bounds on the ones which can't be totally identified. This is only natural for the kinds of parameters he has in mind, like the effects of policy interventions.
(Thank to Gustavo Lacerda for corrections.)
- Recommended, big picture:
- Charles Manski, Identification for Prediction and Decision [Review: Better Roughly Right Than Exactly Wrong]
- Giorgio Picci, "Some Connections Between the Theory of Sufficient
Statistics and the Identifiability Problem", SIAM Journal on Applied
Mathematics 33 (1977): 383--398 [Introduces the idea of
a "maximal identifiable statistic" --- the coarsest partition of hypothesis
space where each equivalence class/cell of the partition gives rise to
a distinct distribution of observables. (I would prefer "parameter"
or "functional", rather than "statistic", since it's a function of the
distribution, not the observables, but that's a quibble.) See more
under sufficiency. JSTOR.
Thanks to Rhiannon Weaver for the pointer.]
- Recommended, close-ups:
- Omar Melikechi, Alexander L. Young, Tao Tang, Trevor Bowman, David Dunson, James Johndrow, "Limits of epidemic prediction using SIR models", arxiv:2112.07039 [My comments]
- Sven Zenker, Jonathan Rubin, Gilles Clermont, "From Inverse Problems in Mathematical Physiology to Quantitative Differential Diagnoses", PLoS Computational Biology 3 (2007): e205 [When your model is unidentified, do an experiment]
- To read:
- Elizabeth S. Allman, Catherine Matias, John A. Rhodes, "Identifiability of parameters in latent structure models with many observed variables", Annals of Statistics 37 (2009): 3099--3132, arxiv:0809.5032
- David Campbell, Subhash Lele, "An ANOVA Test for Parameter Estimability using Data Cloning with Application to Statistical Inference for Dynamic Systems", arxiv:1305.3299
- Marisa C. Eisenberg, Michael A. L. Hayashi, "Determining Structurally Identifiable Parameter Combinations Using Subset Profiling", arxiv:1307.2298
- Paul Gustafson
- "On Model Expansion, Model Contraction, Identifiability and Prior Information: Two Illustrative Scenarios Involving Mismeasured Variables", Statistical Science 20 (2005): 111--140 [Thanks to Gustavo Lacerda for the pointer]
- "On the behaviour of Bayesian credible intervals in partially identified models", Electronic Journal of Statistics 6 (2012): 2107--2124
- Changsung Kang, Jin Tian, "Inequality Constraints in Causal Models with Hidden Variables", arxiv:1206.6829
- Subhash R. Lele, Khurram Nadeem and Byron Schmuland, "Estimability and Likelihood Inference for Generalized Linear Mixed Models Using Data Cloning", Journal of the American Statistical Association 105 (2010): 1617--1625
- Benjamin B. Machta, Ricky Chachra, Mark K. Transtrum, James P. Sethna, "Parameter Space Compression Underlies Emergent Theories and Predictive Models", arxiv:1303.6738
- Charles Manski, Partial Identification of Probability Distributions
- Robert Nishihara, Thomas Minka, Daniel Tarlow, "Detecting Parameter Symmetries in Probabilistic Models", arxiv:1312.5386