Algorithmic Information Theory

Last update: 21 Apr 2025 21:17
First version: 5 May 2015

Fix your favorite universal computer \( U \). Now consider any string of symbols \( x \), or anything which we can describe by means of a string of symbols. Among all the programs which can be run on \( U \), there will be some which will produce \( x \) and then stop. There will therefore be a minimum length for such programs. Call this length \( K_U(x) \). This is the "algorithmic information content" of \( x \) (relative to \( U \)), or its Kolmgorov (-Chaitin-Solomonoff) complexity.

Note that one program which could produce \( x \) would be
print(\( x \)); stop;
(or its equivalent in the language of \( U \)), so \( K_U(x) \) can't be much bigger than \( |x| \), the length of \( x \). Sometimes, clearly, \( K_U(x) \) can be much smaller than \( x \): if \( x \) just consists of the same symbol repeated over and over, say 0, we could use the program for (i in 1:n) { print("0"); }; stop;
and give it n=\( |x| \), for a total length of a constant (the loop-and-print bit) plus \( \log{|x|} \) (the number of symbols needed to write the length of \( x \)). While precise values will, irritatingly, be relative to the universal computer \( U \), because the computer is universal, it can emulate any other computer \( V \) with a finite-length program which we might call \( K_U(V) \). Hence \[ K_U(x) \leq K_V(x) + K_U(V), \] and algorithmic information content differs by at most an additive, string-independent constant across computers.

It turns out that the quantity \( K_U(x) \) shares many formal properties with the (Shannon) entropy of a random variable, typically written \( H[X] \), so that one can define analogs of all the usual information-theoretic quantities in purely algorithmic terms. Along these lines, we say that \( x \) is "incompressible" if \( K_U(x) \approx |x| \). What makes this mathematically interesting is that incompressible sequences turn out to provide a model of sequences of independent and identically distributed random variables, and, vice versa, the typical sample path of an IID stochastic process is incompressible. This generalizes: almost every trajectory of an ergodic stochastic process has a Kolmogorov complexity whose growth rate equals its entropy rate (Brudno's theorem). This, by the way, is why I don't think Kolmogorov complexity makes a very useful complexity measure: it's maximal for totally random things! But Kolmogorov complexity would be a good measure of (effective) stochasticity, if we could actually calculate it.

Unfortunately, we cannot calculate it. There is a lovely little proof of this, due to Nohre, which I learned of from a paper by Rissanen, and can't resist rehearsing. (There are older proofs, but they're not so pretty.) Suppose there was a program \( P \) which could read in an object \( x \) and return its algorithmic information content relative to some universe computer, so \( P(X) = K_U(x) \). We now use \( P \) to construct a new program \( V \) which compresses incompressible objects.

Sort all sequences by length, and then alphabetically.
For the \( i^{\mathrm{th}} \) sequence \( x^i \), use \( P \) to find \( K_U(x^i) \).
If \( K_U(x^i) \leq |V| \), then keep going.
Otherwise, set \( z \) to \( x^i \), return \( z \), and stop.

\( V \) outputs \( z \) and stops, so \( K_U(z) \leq |V|\), but, by construction, \( K_U(z) > |V| \), a contradiction. The only way out of this contradiction is to deny the premise that \( P \) exists: there is no algorithm which calculates the Kolmogorov complexity of an arbitrary string. You can in fact strengthen this to say that there is no algorithm which approximates \( K_U \). (In particular, you can't approximate it with gzip.) \[ \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \]

Expected Kolmogorov complexity vs. Shannon entropy Up to (unimportant) additive constants, \( \Expect{K_U(X)} = H[X] \). To prove this, start with the fact that \( H[X] \) minimizes the expected code length for any coding scheme for \( X \). That is, pick any invertible mapping \( c \) from the values of \( X \), \( \mathcal{X} \), to binary sequences. Say \( \ell(c(x)) \) is the length of the binary encoding of \( x \). Then \( \Expect{\ell(c(X))} \) is the expected code length, and (i) \( \Expect{\ell(c(X))} \geq H[X] \), and (ii) there are coding schemes which come arbitrarily close to the lower bound (while still being uniquely decode-able). Taking this as given, I make two claims:

\( \Expect{K_U(X)} \geq H[X] \): The mapping from \( x \) to the program on \( U \) which generates \( x \) (and then stops) is a binary encoding of \( x \). \( K_U(x) \) is the length of that encoding. Since \( H[X] \) is a lower bound on the expected code length, it must be a lower bound on expected program length.
\( \Expect{K_U(X)} \leq H[X] + \text{constant} \): Given a coding/decoding scheme which achieves the Shannon lower bound, there is a finite program \( D \) on \( U \) which implements the decoding half of the scheme, and the same program \( D \) can be used for all \( x \). Therefore, a program on \( U \) which will generate \( x \) and then stop is \( D \) plus the Shannon coding of \( x \). The length of such a program is \( |D| + \) length of the encoding of \( x \), and the expected length, over random choices of \( X \), is therefore \( |D| + H[X] \).

Combining the two claims, \( \Expect{K_U(X)} = H[X] \) up to an additive constant.

Cover and Thomas, Elements of Information Theory [specifically the chapter on Kolmogorov complexity]
Ming Li and Paul M. B. Vitanyi, An Introduction to Kolmogorov Complexity and Its Applications

Peter Gacs, John T. Tromp and Paul M. B. Vitanyi, "Algorithmic Statistics", IEEE Transactions on Information Theory 47 (2001): 2443--463, arxiv:math.PR/0006233
Stefano Galatolo, Mathieu Hoyrup, and Cristóbal Rojas, "Effective symbolic dynamics, random points, statistical behavior, complexity and entropy", arxiv:0801.0209 [All, not almost all, Martin-Lof points are statistically typical.]
Jan Lemeire, Dominik Janzing, "Replacing Causal Faithfulness with Algorithmic Independence of Conditionals", Minds and Machines 23 (2013): 227--249
G. W. Müller, "Randomness and extrapolation", Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 2 (Univ. of Calif. Press, 1972), 1--31 [On a notion of randomness supposedly related to, but stronger than, that of Martin-Löf.]
Tom F. Sterkenburg, "Solomonoff Prediction and Occam's Razor", Philosophy of Science 83 (2016): 459--479, phil-sci/12429
Bastian Steudel, Nihat Ay, "Information-theoretic inference of common ancestors", arxiv:1010.5720
Paul M. B. Vitanyi and Ming Li, "Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity", IEEE Transactions on Information Theory 46 (2000): 446--464, cs.LG/9901014
Vladimir Vovk, "Superefficiency from the Vantage Point of Computability", Statistical Science 24 (2009): 73--86

The notes and slides from lecture 8 in my "Chaos, Complexity and Inference" class

Luis Antunes, Bruno Bauwens, Andre Souto, Andreia Teixeira, "Sophistication vs Logical Depth", arxiv:1304.8046
John C. Baez, Mike Stay, "Algorithmic Thermodynamics", arxiv:1010.2067
George Barmpalias and Andrew Lewis-Pye, "Compression of Data Streams Down to Their Information Content", IEEE Transactions on Information Theory 65 (2019): 4471--4485
Fabio Benatti, Tyll Krueger, Markus Mueller, Rainer Siegmund-Schultze and Arleta Szkola, "Entropy and Algorithmic Complexity in Quantum Information Theory: a Quantum Brudno's Theorem", quant-ph/0506080
Laurent Bienvenu, Adam Day, Mathieu Hoyrup, Ilya Mezhirov, Alexander Shen, "A constructive version of Birkhoff's ergodic theorem for Martin-Löf random points", arxiv:1007.5249
Laurent Bienvenu, Peter Gacs, Mathieu Hoyrup, Cristobal Rojas, Alexander Shen, "Algorithmic tests and randomness with respect to a class of measures", arxiv:1103.1529
Laurent Bienvenu, Rod Downey, "Kolmogorov Complexity and Solovay Functions", arxiv:0902.1041
Laurent Bienvenu, Alexander Shen, "Algorithmic information theory and martingales", arxiv:0906.2614
Claudio Bonanno, "The Manneville map: topological, metric and algorithmic entropy," math.DS/0107195
Claudio Bonanno and Pierre Collet, "Complexity for Extended Dynamical Systems", Communications in Mathematical Physics 275 (2007): 721--748, math/0609681
The Computer Journal, 42:4 (1999) [Special issue on Kolmogorov complexity and inference]
Boris Darkhovsky, Alexandra Pyriatinska, "Epsilon-complexity of continuous functions", arxiv:1303.1777
Łukasz Dębowski, "Is Natural Language a Perigraphic Process? The Theorem about Facts and Words Revisited", Entropy 20 (2018): 85
David Doty, "Every sequence is compressible to a random one", cs.IT/0511074 ["Kucera and Gacs independently showed that every infinite sequence is Turing reducible to a Martin-Lof random sequence. We extend this result to show that every infinite sequence S is Turing reducible to a Martin-Lof random sequence R such that the asymptotic number of bits of R needed to compute n bits of S, divided by n, is precisely the constructive dimension of S."]
David Doty and Jared Nichols, "Pushdown Dimension", cs.IT/0504047
Peter Gács, "Uniform test of algorithmic randomness over a general space", Theoretical Computer Science 341 (2005): 91--137 ["The algorithmic theory of randomness is well developed when the underlying space is the set of finite or infinite sequences and the underlying probability distribution is the uniform distribution or a computable distribution. These restrictions seem artificial. Some progress has been made to extend the theory to arbitrary Bernoulli distributions (by Martin-Lof) and to arbitrary distributions (by Levin). We recall the main ideas and problems of Levin's theory, and report further progress in the same framework...."]
Travis Gagie, "Compressing Probability Distributions", cs.IT/0506016 [Abstract (in full): "We show how to store good approximations of probability distributions in small space."]
Stefano Galatolo, Mathieu Hoyrup, Cristóbal Rojas
- "Dynamical systems, simulation, abstract computation", arxiv:1101.0833
- "A constructive Borel-Cantelli Lemma. Constructing orbits with required statistical properties", arxiv:0711.1478
Peter Grünwald and Paul Vitányi, "Shannon Information and Kolmogorov Complexity", cs.IT/0410002
Mrinalkanti Ghosh, Satyadev Nandakumar, Atanu Pal, "Ornstein Isomorphism and Algorithmic Randomness", arxiv:1404.0766
Michael Hochman, "Upcrossing Inequalities for Stationary Sequences and Applications to Entropy and Complexity", arxiv:math.DS/0608311 [where "complexity" = algorithmic information content]
Mathieu Hoyrup, Cristobal Rojas, "Computability of probability measures and Martin-Lof randomness over metric spaces", arxiv:0709.0907
S. Jalalai, A. Maleki and R. G. Baraniuk, "Minimum Complexity Pursuit for Universal Compressed Sensing", IEEE Transactions on Information Theory 60 (2014): 2253--2268, arxiv:1208.5814
Takakazu Mori, Yoshiki Tsujii, Mariko Yasugi, "Computability of Probability Distributions and Characteristic Functions", arxiv:1307.6357
Andrej Muchnik, "Algorithmic randomness and splitting of supermartingales", arxiv:0807.3156
Markus Mueller, "Stationary Algorithmic Probability", arxiv:cs/0608095
Sven Neth, "A Dilemma for Solomonoff Prediction", arxiv:2206.06473
Andrew Nies, Computability and Randomness
E. Rivals and J.-P. Delahae, "Optimal Representation in Average Using Kolmogorov Complexity," Theoretical Computer Science 200 (1998): 261--287
Jason Rute, Topics in Algorithmic Randomness and Computable Analysis
Andrei N. Soklakov, "Complexity Analysis for Algorithmically Simple Strings," cs.LG/0009001
H. Takashashi, "Redundancy of Universal Coding, Kolmogorov Complexity, and Hausdorff Dimension", IEEE Transactions on Information Theory 50 (2004): 2727--2736
Nikolai Vereshchagin and Paul Vitanyi, "Kolmogorov's Structure Functions with an Application to the Foundations of Model Selection," cs.CC/0204037
Paul Vitanyi, "Randomness," math.PR/0110086
Vladimir V'yugin, "On Instability of the Ergodic Limit Theorems with Respect to Small Violations of Algorithmic Randomness", arxiv:1105.4274
C. S. Wallace and David L. Dowe, "Minimum Message Length and Kolmogorov Complexity", The Computer Journal 42 (1999): 270--283