## Koopman Operators for Modeling Dynamical Systems and Time Series

*22 Jul 2022 23:16*

Start with your favorite deterministic dynamical system, say (in discrete time to make things easy) \( x_{t+1} = f(x_t) \), where \( x_t \) is the state. Ordinarily we think of time evolving by repeatedly applying the mapping \( f \), so \( x_{t+2} = f(f(x_t)) = f^{(2)}(x_t) \), and so forth. In general \( x_t = f^{(t)}(x_0) \). The state evolves, according to repeated applications of the map. In general, this mapping is extremely nonlinear, and it only becomes harder to parse out after multiple time steps.

Now consider any nice function \( h \) of the state, which gives us an
observable \( y_t = h(x_t) \). How does this observable evolve? Well,
obviously \( y_{t+1} = h(f(x_t)) \), and in general \( y_t = h(f^{(t)}(x_0) \).
So far so trivial. The trick comes from realizing that what we have actually
done is define an operator on the space of observables, canonically called \( K
\) or \( \mathcal{K} \), where \( h(f^{(t)}(x_) = (K^t h)(x_0) \). (I'm being
pedantic about parentheses to make the order of operations very clear. Also,
this is a good place to say that "nice function" means "measurable function,
plus any other regularity properties we might happen to find useful", e.g.,
sometimes we just care about square-integrable observables.) Instead of
thinking about time evolution in the state space, we can just leave the state
alone, and have the \( K \) operator transform the observables. The advantage
of doing this is that \( K \) is a linear operator on the space of observables;
it's really easy to convince yourself that \(K (h_1+h_2) = K h_1 + K h_2 \).
And linear operators are easy! It's true we've gone from a finite-dimensional
state space to an infinite-dimensional function space, but linearity is still
a really powerful simplification.
If the gods were very kind, \( K \) would
have a countable basis in eigenfunctions, \( K \phi_i = \lambda_i \phi_i \),
and \( h(x) = \sum_{i=1}^{\infty}{c_i \phi_i(x)} \) for some coefficients \( c_i \). Then the dynamics of any observable would be really simple:
\[
K^t h(x_0) = \sum_{i=1}^{\infty}{K^t c_i \phi_i(x_0)} = \sum_{i=1}^{\infty}{\lambda_i^t c_i \phi_i(x_0)}
\]
If we want the dynamics of a *different* observable, \( z_t = g(x(t))
\), then we'd have \( g(x) = \sum_{i=1}^{\infty}{d_i \phi_i(x)} \), and the
dynamics would be \( g(x_t) = K^t g(x_0) = \sum_{i=1}^{\infty}{\lambda^i_t d_i
\phi_i(x_0)} \). That is, only the coefficients going into the observable
would change. We could think of \( \phi_i(x_0) \) as an (infinite) set of
coordinates in which the dynamics are linear (and governed by the eigenvalues
\( \lambda_i \)), and the observations are also linear (and given by the
coefficients defining the observables). Even if the gods are not quite so
kind, and we have to learn some actual spectral theory, linear dynamics, even
on an infinite-dimensional space, is still a lot nicer to have to deal with
than nonlinearity...

(For a stochastic process, or at least
a Markov process, we'd have the contrast between the
"transition kernel" which gives the conditional distribution for the next
state, \( \kappa(x, A) \equiv \mathbb{P}\left( X_{t+1} \in A| X_t =x \right)
\), versus the conditional expectation of an observable, \( \mathbb{E}\left(
h(X_{t+1}) | X_t =x \right) \). Now, rather than looking at the conditional
distribution directly through the kernel, we can define the **Markov
operator** which takes probability measures to probability measures, \(
M\nu(A) = \int{\kappa(x, A) d\nu(x)} \). (If we're dealing with a
deterministic dynamical system, which is after all a special case of a Markov
process, the equivalent of the Markov operator is called a Perron-Frobenius or
Frobenius-Perron operator.) This is a linear operator on probability measures
(and every linear operator on probability measures likewise defines a kernel.)
The adjoint operator to the Markov operator, which acts on observables, is
called, in the literature, the transition operator. So lots of Koopman
operator theory generalizes very directly to the theory of Markov operators.)

The first person to clearly realize all this was, indeed, Koopman in the
1930s. (I haven't read his original papers so I won't cite them, but the
references I do give below agree on this history and I trust them.*) For a long
time this was just a bit of a neat technical trick. (That's certainly how I
learned it in graduate school, as part of ergodic
theory, and how I used it in a 2004 paper on the arrow of time.) What's
intriguing to me, and why I have begun this notebook, is that since then, and
especially over the last decade, people have begun trying
to *practically* use this idea, by learning or estimating \( K \) from
observations. In particular,
control theorists seem to be very taken with this.
Of course this involves some sort of finite-dimensional truncation of the
infinite-dimensional operator, sometimes, it seems to me, an extremely crude
one**.

I don't have any immediate plans to do anything with or in this literature,
but I do want to keep track of it. In particular, at some point I want to
really wrap my head around whether learning the infinite-dimensional but linear
operator \( K \) is really any easier
than learning the
finite-dimensional but nonlinear map \( f \) directly. Also, the
truncations involved in work with "data-driven" Koopman operators make me
wonder about using random features somehow.
In particular, I offer a conjecture, with the disclaimer that I have thought
about it for, literally, five minutes: Say the underlying state \( x_0 \) lives
on a \( d \)-dimensional manifold. Pick \( m \geq 2d+1 \) real-valued
observables \( h_1, \ldots h_{m} \) from a probability distribution supported
on some set of nice functions \( H \). (E.g,, \( H \) might consist of
finite-frequency sine waves with random phase offsets.) *Conjecture*:
An operator which linearly evolves \( h_1, \ldots h_m \) can be extended
(somehow) to an operator which linearly evolves any function in the span of \(
H \). (E.g., the span of random sine waves is all functions with nice Fourier
transforms.)

*: The difference between the state-evolution viewpoint and the observable-evolution viewpoint corresponds to the difference between the Schrodinger and the Heisenberg "pictures" of quantum mechanics. ("Recall" that in time-indepenent QM, if the system has wave-function \( \psi \), the expected value of an observable, represented by an operator \( A \), is \( \langle \psi | A | \psi \rangle \equiv \int{\psi^*(x) A \psi(x) dx} \). In the Schrodinger picture, we time-evolve the wave function, so \( \psi \) at time 0 evolves to \( e^{iHt} \psi \) at time \( t \), \( H \) being the Hamiltonian operator. In the Heisenberg picture, wave functions are static, but the operators representing observables evolve, so \( A(t) = e^{-iHt} A e^{iHt} \). Either way we get the same expression for the expectation of the observable at time \( t \), namely \( \int{\psi(x) e^{-iHt} A e^{iHt} \psi(x) dx} \). This distinction goes back to the 1920s, so I imagine if we were to read back into the history of operator theory before 1926 we'd find someone (Hilbert?) stating the idea as what Terence Tao would call a "trick" (or however you said that in German a century ago). --- Incidentally, before you start wondering whether quantum mechanics, which is linear and infinite-dimensional, mightn't just be the result of looking at the Koopman (or Frobenius-Perron) operators of a finite-dimensional nonlinear dynamical system, I remember hearing my nonlinear dynamics teachers idly batting around the same notion back in the 1990s. They didn't think it was worth pursuing, for a whole host of reasons (starting with Bell's inequalities), and I do not presume to be wiser than them.

**: In particular, "dynamical mode decomposition" seems to mean just "fit a VAR(1) to successive observations and then prophesy upon the eigenvectors", but perhaps I am missing some subtleties.

See also: Equations of Motion from a Time Series

- Recommended:
- Steven L. Brunton, Marko Budišić, Eurika Kaiser and J. Nathan Kutz, "Modern Koopman Theory for Dynamical Systems", SIAM Review
**64**(2022): 229--340 [With apologies to the second author for my ignorant inability to reproduce the accent symbols in his family name in HTML; and with thanks to one of my neighbors for leaving this copy of SIAM Review in the local little free library!] - Andrzej Lasota and Michael C. Mackey, Chaos, Fractals, and Noise: Stochastic Aspects of Dynamics [Though this focuses more on the Frobenius-Perron operator that linearly evolves distributions over states than on the Koopman operator that linearly evolves observables]

- Modesty forbids me to recommend:
- CRS, Almost None of the Theory of Stochastic Processes [Where I tried to explain what I knew about Markov and transition operators, when I knew something about Markov and transition operators]
- CRS, "The Backwards Arrow of Time of the Consistently Bayesian Statistical Mechanic", cond-mat/0410063 [Self-exposition]

- To read:
- Craig Bakker, Steven Rosenthal, Kathleen E. Nowak, "Koopman Representations of Dynamic Systems with Control", arxiv:1908.02233
- Marko Budišić, Ryan M. Mohr and Igor Mezić, "Applied Koopmanism",
Chaos
**22**(2012): 047510, arxiv:1206.3164 - Stefan Klus, Ingmar Schuster, Krikamol Muandet, "Eigendecompositions of Transfer Operators in Reproducing Kernel Hilbert Spaces",
Journal of Nonlinear Science
**30**(2020): 283--315, arxiv:1712.01572 - Samuel E. Otto and Clarence W. Rowley, "Koopman Operators for Estimation and Control of Dynamical Systems", Annual Review of Control, Robotics, and Autonomous Systems
**4**(2021): 59--87 - Yen Ting Lin, Yifeng Tian, Daniel Livescu, Marian Anghel, "Data-driven learning for the Mori--Zwanzig formalism: a generalization of the Koopman learning framework", arxiv:2101.05873
- Manuel Santos Gutiérrez, Valerio Lucarini, MickaĆ«l D. Chekroun, Michael Ghil, "Reduced-Order Models for Coupled Dynamical Systems: Data-driven Methods and the Koopman Operator", Chaos
**31**(2021): 053116, arxiv:2012.01068 - Ali Tavasoli, Teague Henry, Heman Shakeri, "A purely data-driven framework for prediction, optimization, and control of networked processes: application to networked SIS epidemic model", arxiv:2108.02005