Ergodic Theory of Markov and Related Processes

Last update: 24 Sep 2025 09:18
First version: 20 August 2007, actual content added 1 May 2025

\[ \newcommand{\Prob}[1]{\mathbb{P}\left( #1 \right)} \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \]

Yet Another Inadequate Placeholder of references.

--- Oh, OK, I'll take a stab at this. Suppose we have a $k$ state Markov chain, with current state $ X_t $, and with transition matrix $\mathbf{p}$. By an ancient convention, $ \mathbf{p}_{ij} $ is the probability of going from state $i$ to state $j$, $ \mathbf{p}_{ij} = \Prob{X_{t+1}=j|X_t=i} $, so if $f$ is a length-$k$ vector of state probabilities at one time, $ f \mathbf{p} $ is the vector of state probabilities at the next time step, and $ f \mathbf{p}^t $ is the vector of state probabilities at time $t$. Let's assume that $\mathbf{p}$ has only a single connected component and is aperiodic, so we can go from every state to every other state and back again, and there are no periodicities to the possible return times. Then we know that (i) the largest eigenvalue of $\mathbf{p}$, $ \lambda_1 $, is 1, and (ii) there is a unique all-positive left eigenvector going along with this eigenvalue, say $ v_1 $. (That's the Frobenius-Perron theorem, or at any rate its application to this situation.) We can scale this vector until its entries sum to 1. Since they're non-negative and add up to 1, they're a possible probability distribution on the states of the chain. Since $ v_1 \mathbf{p} = v_1 $, this distribution is invariant.

Now further assume that the linear-algebra gods are kind and the left eigenvectors of $\mathbf{p}$ $ v_1, v_2, \ldots v_k $, form a basis, so we can write \[ f=\sum_{i=1}^{k}{c_i v_i} \] Now the dynamics under the Markov chain are simple: \[ f \mathbf{p} = \sum_{i=1}^{k}{c_i v_i \mathbf{p}} = \sum_{i=1}^{k}{c_i v_i \lambda_i} \] and indeed \[ f \mathbf{p}^t = \sum_{i=1}^{k}{c_i v_i \lambda_i^t} \] Since 1 is the largest eigenvalue, $ |\lambda_i| < 1 $ for $ i >1 $. Thus every component of $ f \mathbf{p} $ which comes from the other eigenvectors is shrinking exponentially fast, and \[ f \mathbf{p}^t \rightarrow c_1 v_1 \] Since $ f \mathbf{p}^t $ must be a probability distribution, and so add up to 1, and since $ v_1 $ adds up to 1, we can conclude that $ c_1=1 $. Thus \[ f \mathbf{p}^t \rightarrow v_1 \]

To sum up: starting from any initial distribution, the distribution of the Markov chain approaches the invariant distribution. This holds in particular even if $f$ puts probability 1 on a particular state $x$: \[ X_{t}|X_0=x \rightsquigarrow v_1 \] This means that $ X_t $ becomes asymptotically independent of $ X_0 $ as $t\rightarrow\infty$. But since there's nothing special about starting at time 0, that means that $ X_t $ and $ X_s $ become asymptotically independent as $|t-s| \rightarrow \infty$.

Now consider some real-valued function of the state $ h(X_t) $. By the argument about, $ h(X_t) $ and $ h(X_s) $ will be asymptotically independent as $|t-s| \rightarrow \infty$. In fact, one can show that the correlation between $ h(X_t) $ and $ h(X_s) $ will decay exponentially fast, with the exponent depending on the second eigenvalue $\lambda_2$. (I'll write this out some other time, it's more linear algebra than this nap of my son's will allow.) But now we invoke the world's simplest ergodic theorem and conclude that $\frac{1}{n}\sum_{t=1}^{n}{h(X_t)}$ converges (in mean square) to the expected value of $h(X)$ under the distribution $ v_1 $.

The promised argument about exponential decay of correlations, done during another of my son's naps (14 August 2025): Let's assume, for simplicity, that the chain starts in its invariant distribution, $ X_0 \sim v_1 $. We're interested in the covariance, at lag $t$, between $ h(X_0) $ and $ h(X_t) $, let us say $\gamma(t)$: \[ \gamma(t) \equiv \Expect{h(X_0) h(X_t)} - \Expect{h(X_0)} \Expect{h(X_t)} \] By stationarity, $ \Expect{h(X_0)} = \Expect{h(X_t)} = h(v_1) $, abusing (or extending) notation to denote the expected value of a function under a distribution over states. So let's tackle the product term in the covariance, using the law of total expectation: \[ \begin{eqnarray} \Expect{h(X_0) h(X_t)} & = & \Expect{ \Expect{h(X_0) h(X_t)|X_0} }\\ & = & \Expect{ h(X_0) \Expect{h(X_t)|X_0} } \end{eqnarray} \] As above, the distribution which puts probability 1 on the state $ x_0 $ can be expressed as a linear combination of the $ v_i $, with coefficients $ c_i(x_0) $, with $ c_1=1 $ always. So \[ \begin{eqnarray} \Expect{h(X_t)|X_0 = x_0} & = & \sum_{j=1}^{k}{h(j) \Prob{X_t=j|X_0=x_0}}\\ & = & \sum_{j=1}^{k}{h(j) \sum_{i=1}^{k}{c_i(x_0) \lambda_i^t v_{ij}}}\\ & = & \sum_{i=1}^{k}{ c_i(x_0) \lambda_i^t \sum_{j=1}^{k}{h(j) v_{ij}}}\\ & = & \sum_{i=1}^{k}{ h(v_i) c_i(x_0) \lambda_i^t}\\ & = & h(v_1) + \sum_{i=2}^{k}{h(v_i) c_i(X_0) \lambda_i^t} \end{eqnarray} \] exchanging the order of summation in the middle. Substituting in, \[ \begin{eqnarray} \Expect{h(X_0) h(X_t)} & = & h(v_1)\Expect{h(X_0)} + \Expect{h(X_0) \sum_{i=2}^{k}{h(v_i) c_i(X_0) \lambda_i^t}}\\ & = & h(v_1)^2 + \sum_{i=2}^{k}{h(v_i) \Expect{h(X_0) c_i(X_0)} \lambda_i^t}\\ \gamma(t) & = & \sum_{i=2}^{k}{h(v_i) \Expect{h(X_0) c_i(X_0)} \lambda_i^t} \end{eqnarray} \] remembering the definition of $ h(v_1) $, and interchanging (finite) sums and expectations.

Since $ |\lambda_i| < 1 $ for $ i > 1 $, each of the (finitely many) terms in the sum is going to zero exponentially in $t$. In fact, for large $t$, the covariance must be at most $ O(\lambda_2^t) $, because all the other exponentials are that small or smaller. Thus, as promised, the covariance of any function of the state decays to zero exponentially fast, with the exponent being controlled by the second-largest eigenvalue $ \lambda_2 $.

(What if we didn't start the chain in its invariant distribution, but said $ X_0 \sim \rho \neq v_1 $ instead? Basically the same thing, but with more annoying terms to keep track of, and re-assure ourselves are exponentially small. Express $ \rho = \sum_{i=1}^{k}{d_i v_i} $, where again $ d_1 = 1 $ to ensure proper normalization. Re-doing the algebra above, we get $ \Expect{h(X_0)} = h(\rho) $, $ \Expect{h(X_t)} = h(v_1) + O(\lambda_2^t) $, and $ \Expect{h(X_0) h(X_t)} = h(\rho) h(v_1) + O(\lambda_2^t) $, so, once again, everything's as promised.)

(I am pretty sure I had a different, more linear-algebraic argument in mind in May, but I forget what it was.)

J. Doob, Stochastic Processes
Shaul R. Foguel, The Ergodic Theory of Markov Processes
Andrzej Lasota and Michael C. Mackey, Chaos, Fractals, and Noise: Stochastic Aspects of Dynamics
Murray Rosenblatt, Markov Processes: Structure and Asymptotic Behavior

Pavel Chigansky and Ramon van Handel, "A complete solution to Blackwell's unique ergodicity problem for hidden Markov chains", Annals of Applied Probability 20 (2010): 2318--2345
Marius Iosifescu and Serban Grigorescu, Dependence with Complete Connections and Its Applications [Review: Memories Fading to Infinity]
Leonid (Aryeh) Kontorovich [I really don't know why Leo bothered to take my class]
- "Obtaining Measure Concentration from Markov Contraction", arxiv:0711.0987
- "Measure Concentration of Hidden Markov Processes", math.PR/0608064
Aryeh Kontorovich and Anthony Brockwell, "A Strong Law of Large Numbers for Strongly Mixing Processes", arxiv:0807.4665
Aryeh Kontorovich and Roi Weiss, "Uniform Chernoff and Dvoretzky-Kiefer-Wolfowitz-type inequalities for Markov chains and related processes", arxiv:1207.4678
Ioannis Kontoyiannis, L. A. Lastras-Montano and S. P. Meyn, "Relative Entropy and Exponential Deviation Bounds for General Markov Chains", ISIT 2005 [PDF reprint via Prof. Meyn]
Mathieu Sinn, Bei Chen, "Central Limit Theorems for Conditional Markov Chains", AIStats 2013

Radoslaw Adamczak, Witold Bednorz, "Exponential Concentration Inequalities for Additive Functionals of Markov Chains", arxiv:1201.3569
Peter H. Baxendale, "Renewal theory and computable convergence rates for geometrically ergodic Markov chains", Annals of Applied Probability 15 (2005): 700--738 = math.PR/0503515
Itai Benjamini, Krzysztof Burdzy, Zhen-Qing Chen, "Shy couplings", math.PR/0509458 ["We say that a coupling is ``shy'' if the processes never come closer than some (random) strictly positive distance from each other."]
Gordon Blower and Francois Bolley, "Concentration inequalities on product spaces with applications to Markov processes", math.PR/0505536
Anne-Severine Boudou, Pietro Caputo, Paolo Dai Pra and Gustavo Posta, "Spectral gap estimates for interacting particle systems via a Bakry & Emery-type approach", math.PR/0505533 ["We develop a general technique, based on the Bakry-Emery approach, to estimate spectral gaps of a class of Markov operators. We apply this technique to various interacting particle systems."]
Anton Bovier, Michael Eckhoff, Veronique Gayrard and Markus Klein, "Metastability and Small Eigenvalues in Markov Chains," cond-mat/0007343
Patrick Cattiaux, Djalil Chafai, Arnaud Guillin, "Central limit theorems for additive functionals of ergodic Markov diffusions processes", arxiv:1104.2198
Patrick Cattiaux and Arnaud Guillin, "Trends to Equilibrium in Total Variation Distance", math.PR/0703451
Mu-Fa Chen
- Eigenvalues, Inequalities, and Ergodic Theory
- "Ergodic convergence rates of Markov processes--eigenvalues, inequalities and ergodic theory", math.PR/0304367
Xia Chen, Limit Theorems for Functionals of Ergodic Markov Chains with General State Space
Robert Cogburn, "The central limit theorem for Markov processes", Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 2, pp. 485-512
G. B. DiMasi and L. Stettner, "Ergodicity of hidden Markov models", Mathematics of Control, Signals, and Systems 17 (2005): 269--296
R. Douc, E. Moulines, and Jeffrey S. Rosenthal, "Quantitative bounds on convergence of time-inhomogeneous Markov chains", Annals of Applied Probability 14 (2004): 1643--1665 = math.PR/0403532
Cyrille Dubarry and Sylvain Le Corff, "Non-asymptotic deviation inequalities for smoothed additive functionals in nonlinear state-space models", Bernoulli 19 (2013): 2222--2249
Déborah Ferré, Loïc Hervé, James Ledoux, "Limit theorems for stationary Markov processes with L2-spectral gap", Annales de l'Institut Henri Poincaré, Probabilités et Statistiques 48 (2012): 396--423, arxiv:1201.4579
Steven R. Finch, "Another Look at AR(1)", arxiv:0710.5419 [Central limit theorems and exponential growth]
Nicolas Fournier, Arnaud Guillin, "On the rate of convergence in Wasserstein distance of the empirical measure", arxiv:1312.2128
Leonid Galtchouk, Serguei Pergamenchtchikov, "Geometric ergodicity for families of homogeneous Markov chains", arxiv:1002.2341
Steven T. Garren and Richard L. Smith, "Estimating the second largest eigenvalue of a Markov transition matrix", Bernoulli 6 (2000): 215--242
Arnaud Guillin, Aldéric Joulin, "Measure concentration through non-Lipschitz observables and functional inequalities", arxiv:1202.2341
Olle Häggström, "On the central limit theorem for geometrically ergodic Markov chains", Probability Theory and Related Fields 132 (2005): 74--82
Stefano Isola
- "On the rate of convergence to equilibrium for countable ergodic Markov chains", math.PR/0308018 ["Using elementary methods, we prove that for a countable Markov chain $P$ of ergodic degree $d > 0$ the rate of convergence towards the stationary distribution is subgeometric of order $n^{-d}$, provided the initial distribution satisfies certain conditions of asymptotic decay. ... Explicit conditions allowing to obtain the actual asymptotics for the rate of convergence are also discussed."]
- "On systems with finite ergodic degree", math.DS/0308019
Milton Jara, Tomasz Komorowski and Stefano Olla, "Limit theorems for additive functionals of a Markov chain", arxiv:0809.0177 [Convergence to alpha-stable distributions]
Minjun Kim and Jinsu Kim, "A Path Method for Non-exponential Ergodicity of Markov Chains and Its Application for Chemical Reaction Systems", Journal of Statistical Physics 192 (2025): 74
Tomasz Komorowski, Szymon Peszat, Tomasz Szarek, "On ergodicity of some Markov processes", Annals of Probability 38 (2010): 1401--1443, arxiv:0810.4609
Tomasz Komorowski, Anna Walczuk, "Central limit theorem for Markov processes with spectral gap in Wasserstein metric", arxiv:1102.1842
I. Kontoyiannis and S. P. Meyn, "Geometric ergodicity and the spectral gap of non-reversible Markov chains", Probability Theory and Related Fields 154 (2012): 327--339
Krzysztof Latuszynski, Blazej Miasojedow, Wojciech Niemiro, "Nonasymptotic bounds on the estimation error of MCMC algorithms", Bernoulli 19 (2013): 2033--2066, arxiv:1106.4739
Andreas Nordvall Lageras and Orjan Stenflo, "Central limit theorems for contractive Markov chains", Nonlinearity 18 (2005): 1955--1965
Martial Longla, Magda Peligrad, "Some Aspects of Modeling Dependence in Copula-based Markov chains", arxiv:1107.1794
Neal Madras and Dana Randall, "Markov chain decomposition for convergence rate analysis", Annals of Applied Probability 12 (2002): 581--606
Neal Madras and Deniz Sezer, "Quantitative bounds for Markov chain convergence: Wasserstein and total variation distances", Bernoulli 16 (2010): 882--908, arxiv:1102.5245
Sean P. Meyn and Richard L. Tweedie, Markov Chains and Stochastic Stability [Full text of the first edition free online, courtesy of Prof. Meyn.]
Blazej Miasojedow, "Hoeffding's inequalities for geometrically ergodic Markov chains on general state space", arxiv:1201.2265
F. Rassoul-Agha and T. Seppalainen, "An Almost Sure Invariance Principle for Additive Functionals of Markov Chains", math.PR/0411603
Frank Redig, Florian Völlering, "Concentration of Additive Functionals for Markov Processes and Applications to Interacting Particle Systems", arxiv:1003.0006
Sunder Sethuraman and S. R. S. Varadhan, "A martingale proof of Dobrushin's theorem for non-homogeneous Markov chains", math.PR/0404231 [As you know, Bob, Dobrushin's theorem is a central limit theorem for Markov chains]
Wojciech Slomczynski, Dynamical Entropy, Markov Operators, and Iterated Function Systems [Many thanks to Dr. Slomczynski for sending a copy of his work]
Xin Thomson Tong, Ramon van Handel, "Ergodicity and stability of the conditional distributions of nondegenerate Markov chains", Annals of Applied Probability 22 (2012): 1495--1540, arxiv:1101.1822
Feng-Yu Wang, "Coupling and Applications", arxiv:1012.5687 ["self-contained account for coupling arguments and applications in the context of Markov processes"]
Ivan Werner [The sequence of papers on contractive Markov systems look interesting, but I keep not finding the time to read them...]
- "Contractive Markov Systems", Journal of the London Mathematical Society 71 (2005): 236--258
- "Contractive Markov systems II", math.PR/0503633
- "Fundamental Markov systems", math.PR/0509120
- "The generalized Markov measure as an equilibrium state", math.DS/0503644 = Nonlinearity 18 (2005): 2261--2274
- "Kolmogorov-Sinai entropy of a generalized Markov shift", math.DS/0502389
- "On coding by Feller contractive Markov systems", math.DS/0506476
- "A necessary condition for the uniqueness of the stationary state of a Markov system", math.PR/0508054