Notebooks

## Statistical Inference for Markov and Hidden Markov Models

28 Jun 2021 00:17

I am concerned here with inferring the parameters and/or the structure of the model, not with the estimation of the hidden state in a hidden Markov model with known parameters; that problem falls under filtering.

Parameter inference (what machine learning types would call "learning") given a known, fixed structure. Determining the structure ("discovery"). Order estimation as a particular case of discovery, and of model selection. (Discovery may need its own notebook soon.)

Markov models naturally form exponential families; what classes of partially-observable Markov processes also do?

See also: Inference for Stochastic Differential Equations (strictly speaking a sub-topic, but one with enough special features to get separate treatment); Markov models; Mixture Models; Prediction Processes; Markovian (and Conceivably Causal) Representations of Stochastic Processes; Statistics; Time Series; Universal Prediction Algorithms

Recommended (big picture):
• Patrick Billingsley, Statistical Inference for Markov Chains [Foundational.]
• Olivier Cappé Eric Moulines and Tobias Ryden, Inference in Hidden Markov Models [This is a superb book, treating all the main statistical problems connected with HMMs with both clarity and rigor. There is a chapter/appendix which reminds the reader who has forgotten about Markov chains on general state spaces, but remembers measure-theoretic probability very well, about their properties. (The idea that the reader of a book on HMMs may not know what the Viterbi algorithm is, but will of course recall the Hahn-Jordan decomposition, strikes me as very much a product of the French school of probability theory — from which I have learned much! But it's not so far gone as to make the whole thing an exercise in the "general theory of processes".)]
• Andrew Fraser, Hidden Markov Models and Dynamical Systems [Review: The Statistics of Moving Shadows]
• Peter Guttorp, Stochastic Modelling of Scientific Data
Recommended (close-ups):
• Animashree Anandkumar, Daniel Hsu, Sham M. Kakade, "A Method of Moments for Mixture Models and Hidden Markov Models", arxiv:1203.0683
• David Blackwell and Lambert Koopmans, "On the Identifiability Problem for Functions of Finite Markov Chains", The Annals of Mathematical Statistics 28 (1957): 1011--1015 [An old, but very clear, paper on the problems presented by what we now call hidden Markov models]
• Peter Bühlmann and Abraham J. Wyner, "Variable Length Markov Chains", The Annals of Statistics 27 (1999): 480--513 [Preprint available as Berkeley statistics department technical report 479]
• Olivier Cappé "Online EM Algorithm for Hidden Markov Models", Journal of Computational and Graphical Statistics 20 (2011): 728--749, arxiv:0908.2359
• George Cybenko and Valentino Crespi, "Learning Hidden Markov Models Using Nonnegative Matrix Factorization", IEEE Transactions on Information Theory 57 (2011): 3963--3970, arxiv:0809.4086 [Though it contains an error, at least in the preprint version, about the capacities of our CSSR algorithm --- we can get model structures right with much less data than they think, though we presented examples using more data than was strictly needed.]
• Randal Douc, Eric Moulines, Jimmy Olsson, and Ramon van Handel, "Consistency of the maximum likelihood estimator for general hidden Markov models", Annals of Statistics 39 (2011): 474--513
• P. Dupont, F. Denis and Y. Esposito, "Links between probabilistic automata and hidden Markov models: probability distributions, learning models and induction algorithms", Pattern Recognition 38 (2005): 1349--1371
• Emily B. Fox, Erik B. Sudderth, Michael I. Jordan, Alan S. Willsky, "A sticky HDP-HMM with application to speaker diarization", Annals of Applied Statistics 5 (2011): 1020--1056, arxiv:0905.2592
• Subhashis Ghosal and Yongqiang Tang, "Bayesian Consistency for Markov Processes", Sankhya 68 (2006): 227--239
• Subhashis Ghosal and Aad van der Vaart, "Convergence Rates of Posterior Distributions for Non-IID Observations", Annals of Statistics 35 (2007): 192--223 = arxiv:0708.0491
• J. D. Kalbfleisch and J. F. Lawless, "Least-Squares Estimation of Transition Probabilities from Aggregate Data", The Canadian Journal of Statistics / La Revue Canadienne de Statistique 12 (1984): 169--182 [JSTOR]
• Mladen Kolar, Le Song, Amr Ahmed, Eric P. Xing, "Estimating time-varying networks", Annals of Applied Statistics 4 (2010): 94--123, arxiv:0812.5087
• Gusztav Morvai and Benjamin Weiss
• F. Papangelou, "Large Deviations and the Bayesian Estimation of Higher-Order Markov Transition Functions ", Journal of Applied Probability 33 (1996): 18--27 [JSTOR]
• Yuval Peres and Paul Shields, "Two new Markov order estimators", math.ST/0506080 [Very nice.]
• David Pfau, Nicholas Bartlett and Frank Wood, "Probabilistic Deterministic Infinite Automata", NIPS 23 (2010) [PDF. Really, mixtures over finite automata with arbitrarily many states.]
• George G. Roussas, Contiguity of Probability Measures: Some Applications in Statistics [Asymptotic theory of approximation, estimation and testing, for discrete-time Markov processes on fairly general state-spaces. Mini-review]
• Christopher C. Strelioff, James P. Crutchfield, Alfred W. Hubler, "Inferring Markov Chains: Bayesian Estimation, Model Comparison, Entropy Rate, and Out-of-class Modeling", math.ST/0703715
• Daniel R. Upper, Theory and Algorithms for Hidden Markov Models and Generalized Hidden Markov Models [Ph.D. thesis, math dept., Berkeley, 1997; online]
Not quite recommended:
• E. Racca, F. Laio, D. Poggi and L. Ridolfi, "Test to determine the Markov order of a time series", Physical Review E 75 (2007): 011126 [The test is to linearly regress x(t+1) on x(t), x(t-1), etc., out to some finite order, and see how far back you have to go before the regression coefficients are insignificantly different from zero. This is not crazy as a first cut idea, but it's not generally valid, and in fact fails for the logistic map.]
Definitely not recommended:
• S. S. Melnyk, O. V. Usatenko, V. A. Yampol'skii and V. A. Golick, "Competition between Two Kinds of Correlations in Literary Texts", physics/0402042 [This has surely got to be one of the ugliest ways of parameterizing a Markov chain I have seen; it's a miracle they don't get probabilities greater than 1, if indeed they don't.]
Modesty forbids me to recommend:
• CRS, "Dynamics of Bayesian Updating with Dependent Data and Misspecified Models", arxiv:0901.1342
• CRS and Kristina Lisa Klinkner, "Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences", cs.LG/0406011 [CSSR]
• Roberto C. Alamino, Nestor Caticha, "Online Learning in Discrete Hidden Markov Models", arxiv:0708.2377
• Armen Allahverdyan, Aram Galstyan, "On Maximum a Posteriori Estimation of Hidden Markov Processes", UAI 2009, arxiv:0906.1980
• Enrique E. Alvarez, "Estimation in Stationary Markov Renewal Processes, with Application to Earthquake Forecasting in Turkey", Methodology and Computing in Applied Probability 7 (2005): 119--130
• Animashree Anandkumar, Kamalika Chaudhuri, Daniel Hsu, Sham M. Kakade, Le Song, Tong Zhang, "Spectral Methods for Learning Multivariate Latent Tree Structure", arxiv:1107.1283
• Sofia Andersson and Tobias Rydén, "Subspace estimation and prediction methods for hidden Markov models", Annals of Statistics 37 (2009): 4131--4152
• Ana Arribas-Gil, Elisabeth Gassiat and Catherine Matias, "Parameter estimation in pair hidden Markov models", math.ST/0509280
• Yves F. Atchade, "Estimation of Network structures from partially observed Markov random fields", arxiv:1108.2835
• A. R. Baigorri, C. R. Goncalves, P. A. A. Resende, "Markov Chain Order Estimation and Relative Entropy", arxiv:0910.0264
• Alexandre Belloni, Roberto I. Oliveira, "Approximate group context tree: applications to dynamic programming and dynamic choice models", arxiv:1107.0312
• Patrice Bertail and Stéphan Clémençon, "Edgeworth expansions of suitably normalized sample mean statistics for atomic Markov chains", Probability Theory and Related Fields 130 (2004): 388--414 [I need to learn more about Edgeworth expansions anyway]
• P. J. Bickel and Y. Ritov, "Inference in Hidden Markov Models I: Local Asymptotic Normality in the Stationary Case", UCB Statistics Technical Report 383 [link]
• Byron Boots, "Spectral Approaches to Learning Predictive Representations" [Thesis proposal, CMU, 2011]
• Byron Boots, Sajid M. Siddiqi, Geoffrey J. Gordon, "Closing the Learning-Planning Loop with Predictive State Representations", arxiv:0912.2385
• Jose Borges and Mark Levene, "Evaluating Variable Length Markov Chain Models for Analysis of User Web Navigation Sessions", cs.AI/0606115
• J. Borwanker, G. Kallianpur and B. L. S. Prakasa Rao, "The Bernstein-von Mises Theorem for Markov Processes", The Annals of Mathematical Statistics 42 (1971): 1241--1253 [JSTOR]
• Carles Bretó, Daihai He, Edward L. Ionides, Aaron A. King, "Time series analysis via mechanistic models", Annals of Applied Statistics 3 (2009): 319--348, arxiv:0802.0021
• Zhiyi Chi, "Effects of statistical dependence on multiple testing under a hidden Markov model", Annals of Statistics 39 (2011): 439--473
• Pavel Chigansky, "Maximum Likelihood Estimator for Hidden Markov Models in continuous time", Statistical Inference for Stochastic Processes 12 (2009): 139--163, arxiv:0707.0271
• Christos Dimitrakakis, "Contex models on sequences of covers", arxiv:1005.2263
• C. C. Y. Dorea and L. C. Zhao, "Nonparametric Density Estimation in Hidden Markov Models", Statistical Inference for Stochastic Processes 5 (2002): 55--64
• Michael DelRose, Christian Wagner, Philip Frederick, "Evidence Feed Forward Hidden Markov Model: A New Type of Hidden Markov Model", arxiv:1102.0899 [Looks like a "to be shot after a fair trial" paper from the abstract, but then, it deserves a fair trial]
• Randal Douc, "Non singularity of the asymptotic Fisher information matrix in hidden Markov models", math.ST/0511631
• Randal Douc and Eric Moulines, "Asymptotic properties of the maximum likelihood estimation in misspecified Hidden Markov models", arxiv:1110.0356
• Farzad Eskandari and Mohammad R. Meshkani, "Empirical Bayes analysis of log-linear models for a generalized finite stationary Markov chain", Metrika 59 (2004): 173--191 [abstract]
• Florence Forbes and Nathalie Peyrard, "Hidden Markov Random Field Model Selection Criteria Based on Mean Field-Like Approximations", IEEE Transactions on Pattern Analysis and Machine Intelligence 25 (2003): 1089--1101 [PostScript preprint]
• Scott D. Foster and Mark V. Bravington, "Graphical Diagnostics for Markov Models for Categorical Data", Journal of Computational and Graphical Statistics forthcoming (2011)
• Halina Frydman and Peter Lakner, "Maximum likelihood estimation of hidden Markov processes", Annals of Applied Probability 13 (2003): 1296--1312
• Cheng-Der Fuh, "Asymptotic operating characteristics of an optimal change point detection in hidden Markov models", Annals of Statistics 32 (2004): 2305--2339 = math.ST/0503682
• Cheng-Der Fuh and Inchi Hu, "Estimation in hidden Markov models via efficient importance sampling", Bernoulli 13 (2007): 492--513, arxiv:0708.4152
• Antonio Galves, Charlotte Galves, Nancy L. Garcia, Florencia Leonardi, "Context tree selection and linguistic rhythm retrieval from written texts", arxiv:0902.3619
• Antonio Galves, Aurélien Garivier, Elisabeth Gassiat, "Joint estimation of intersecting context tree models", arxiv:1102.0673
• Antonio Galves, Florencia Leonardi, "Exponential inequalities for empirical unbounded context trees", arxiv:0710.5900
• Steven T. Garren and Richard L. Smith, "Estimating the second largest eigenvalue of a Markov transition matrix", Bernoulli 6 (2000): 215--242
• Hugo Harari-Kermadec, "Regenerative block empirical likelihood for Markov chains", arxiv:1102.3107
• Daniel Hsu, Sham M. Kakade, Tong Zhang, "A Spectral Algorithm for Learning Hidden Markov Models", arxiv:0811.4413
• Masashi Inoue and Naonori Ueda, "Exploitation of Unlabeled Sequences in Hidden Markov Models", IEEE Transactions on Pattern Analysis and Machine Intelligence 25 (2003): 1570--1581
• H. Ito, S.-I. Amari and K. Kobayashi, "Identifiability of Hidden Markov Information Sources and Their Minimum Degrees of Freedom", IEEE Transactions on Information Theory 38 (1992): 324--333
• Dezhe Z. Jin, Alexay A. Kozhevnikov, "A compact statistical model of the song syntax in Bengalese finch", arxiv:1011.2998
• Hans R. Künsch, "State Space and Hidden Markov Models", pp. 109--173 in Ole E. Barndorff-Nielsen, David R. Cox and Claudia Klüppelberg (eds.), Complex Stochastic Systems
• J. Lember, A. Koloydenko, "Adjusted Viterbi training for hidden Markov models", arxiv:0709.2317
• Florencia Leonardi, "Some upper bounds for the rate of convergence of penalized likelihood context tree estimators", arxiv:0701810
• E. Locherbach, "Likelihood Ratio Processes for Markovian Particle Systems with Killing and Jumps", Statistical Inference for Stochastic Processes 5 (2002): 153--177
• Claudio Macci, "Large Deviations for Empirical Estimators of the Stationary Distribution of a Semi-Markov Process with Finite State Space", Communications in Statistics: Theory and Methods 37 (2008): 3077--3089
• Philipp Metzner, Frank Noe and Christof Schutte, "Estimating the sampling error: Distribution of transition matrices and functions of transition matrices for given trajectory data", Physical Review E 80 (2009): 021106 [I presume they have a good reason for not just using the delta method, and/or bootstrapping, but I'll have to read it to see what that is]
• Elchanan Mossel, Sébastien Roch, "Learning nonsingular phylogenies and hidden Markov models", Annals of Applied Probability 16 (2006): 583--614, arxiv:cs.LG/0502076
• Michael H. Neumann, Efstathios Paparoditis, "Goodness-of-fit tests for Markovian time series models: Central limit theory and bootstrap approximations", Bernoulli 14 (2008): 14--46, arxiv:0803.0835
• Enza Orlandi, Eva Loecherbach, "On the neighborhood radius estimation in Variable-neighborhood Markov Random Fields", arxiv:1002.4850
• Adam Paszkiewicz, "When transition count for a Markov chains is a complete sufficient statistic", Statistics and Probability Letters 76 (2006): 757--763 [When the initial and final states are fixed, for example]
• Spiridon Penev, Hanxiang Peng, Atnon Schick and Wolfgang Wefelmeyer, "Efficient estimators for functionals of Markov chains with parametric marginals", Statistics and Probability Letters 66 (2004): 335--345
• Ricardo Ríos, Luis-Angel Rodríguez, "Penalized estimate of the number of states in Gaussian linear AR with Markov regime", Electronic Journal of Statistics 2 (2008): 1111--1128, arxiv:0807.2726
• Amr Sadek and Nikolaos Limnios, "Nonparametric estimation of reliability and survival function for continuous-time finite Markov processes", Journal of Statistical Planning and Inference 133 (2005): 1--21
• Manuel S. Santos, "Consistency properties of a simulation-based estimator for dynamic processes", Annals of Applied Probability 20 (2010): 196--213
• Anton Schick and Wolfgang Wefelmeyer, "Estimating Joint Distributions of Markov Chains", Statistical Inference for Stochastic Processes 5 (2002): 1--22
• Peter Sunehag, Marcus Hutter, "Consistency of Feature Markov Processes", arxiv:1007.2075
• V. B. Tadic, "Analyticity, Convergence, and Convergence Rate of Recursive Maximum-Likelihood Estimation in Hidden Markov Models", IEEE Transactions on Information Theory 56 (2010): 6406--6432, arxiv:0904.4264
• Iuliana Teodorescu, "Maximum Likelihood Estimation for Markov Chains", arxiv:0905.4131
• M. J. van der Heyden et al., "Testing the Order of Discrete Markov Chains Using Surrogate Data", Physica D 117 (1998): 299--313
• Ramon van Hanel, "On the minimal penalty for Markov order estimation", Probability Theory and Related Fields 150 (2011): 709--738, arxiv:0908.3666
• Elodie Vernet, "Posterior consistency for nonparametric Hidden Markov Models with finite state space", arxiv:1311.3092
• Martin J. Wainwright, "Inconsistent parameter estimation in Markov random fields: Benefits in the computation-limited setting", cs.LG/0602092
• Christian H. Weiss, "Rule generation for categorical time series with Markov assumptions", Statistics and Computing 21 (2011): 1--16 [VLMMs]
• James Zhao, "Hidden Markov Models with Multiple Observation Processes", arxiv:1010.1042
• L. C. Zhao, C. C. Y. Dorea and C. R. Gonçalves, "On Determination of the Order of a Markov Chain", Statistical Inference for Stochastic Processes 4 (2001): 273--282
• Ming-Jie Zhao and Herbert Jaeger, "Making the Error-Controlling Algorithm of Observable Operator Models Constructive", Neural Computation 21 (2009): 3460--3486