Notebooks

Algorithmic Information Theory

14 Jul 2022 14:33

Fix your favorite universal computer $U$. Now consider any string of symbols $x$, or anything which we can describe by means of a string of symbols. Among all the programs which can be run on $U$, there will be some which will produce $x$ and then stop. There will therefore be a minimum length for such programs. Call this length \( K_U(x) \). This is the "algorithmic information content" of $x$ (relative to $U$), or its Kolmgorov (-Chaitin-Solomonoff) complexity.

Note that one program which could produce $x$ would be
print($x$); stop;
(or its equivalent in the language of $U$), so \( K_U(x) \) can't be much bigger than $|x|$, the length of $x$. Sometimes, clearly, \( K_U(x) \) can be much smaller than $x$: if $x$ just consists of the same symbol repeated over and over, say 0, we could use the program for (i in 1:n) { print("0"); }; stop;
and give it n=$|x|$, for a total length of a constant (the loop-and-print bit) plus $\log{|x|}$ (the number of symbols needed to write the length of $x$). While precise values will, irritatingly, be relative to the universal computer $U$, because the computer is universal, it can emulate any other computer $V$ with a finite-length program which we might call \( K_U(V) \). Hence \[ K_U(x) \leq K_V(x) + K_U(V), \] and algorithmic information content differs by at most an additive, string-independent constant across computers.

It turns out that the quantity \( K_U(x) \) shares many formal properties with the (Shannon) entropy of a random variable, typically written $H[X]$, so that one can define analogs of all the usual information-theoretic quantities in purely algorithmic terms. Along these lines, we say that $x$ is "incompressible" if \( K_U(x) \approx |x| \). What makes this mathematically interesting is that incompressible sequences turn out to provide a model of sequences of independent and identically distributed random variables, and, vice versa, the typical sample path of an IID stochastic process is incompressible. This generalizes: almost every trajectory of an ergodic stochastic process has a Kolmogorov complexity whose growth rate equals its entropy rate (Brudno's theorem). This, by the way, is why I don't think Kolmogorov complexity makes a very useful complexity measure: it's maximal for totally random things! But it would be a good measure of (effective) stochasticity, if we could actually calculate it.

Unfortunately, we cannot calculate it. There is a lovely little proof of this, due to Nohre, which I learned of from a paper by Rissanen, and can't resist rehearsing. Suppose there was a program $P$ which could read in an object $x$ and return its algorithmic information content relative to some universe computer, so $P(X) = K_U(x)$. We now use $P$ to construct a new program $Q$ which compresses incompressible objects.

  1. Sort all sequences by length, and then alphabetically.
  2. For the \( i^{\mathrm{th}} \) sequence $x^i$, use $P$ to find \( K_U(x^i) \).
  3. If \( K_U(x^i) \leq |V| \), then keep going.
  4. Otherwise, set $z$ to $x^i$, return $z$, and stop.
$V$ outputs $z$ and stops, so \( K_U(z) \leq |V|\), but, by construction, \( K_U(z) > |V| \), a contradiction. The only way out of this contradiction is to deny the premise that $P$ exists: there is no algorithm which calculates the Kolmogorov complexity of an arbitrary string. You can in fact strengthen this to say that there is no algorithm which approximates \( K_U \). (In particular, you can't approximate it with gzip.)

See also: Complexity Measures; Ergodic Theory; Information Theory; the Minimum Description Length Principle; Probability; "Occam"-style Bounds for Long Programs


Notebooks: