Low-Regret Learning

Last update: 21 Apr 2025 21:17
First version: 12 May 2012

Yet Another Inadequate Placeholder.

A.k.a. "prediction with expert advice", "individual-sequence prediction", "on-line learning". (That last name bugs me. Strictly speaking, one could have an on-line learning algorithm which didn't aim at low regret but at, say, low risk, and indeed the old uses of stochastic optimization for regression aimed implicitly at just that.)

Nicolò Cesa-Bianchi and Gábor Lugosi, Prediction, Learning, and Games [Mini-review]
J. Michael Steele, The Scary Sequences Project [Notes and references on individual-sequence prediction]

Alekh Agarwal and John C. Duchi, "The Generalization Ability of Online Algorithms for Dependent Data", arxiv:1110.2529 [If you have a learning algorithm which achieves low regret against some class of alternatives, and the data source is suitably strongly-mixing, your algorithm will generalize almost as well as the best rival.]
Nicolò Cesa-Bianchi, A. Conconi and C. Gentile, "On the Generalization Ability of On-Line Learning Algorithms", IEEE Transactions on Information Theory 50 (2004): 2050--2057
Nicolò Cesa-Bianchi, Pierre Gaillard, Gábor Lugosi and Gilles Stoltz, "A New Look at Shifting Regret", arxiv:1202.3323
Nicolò Cesa-Bianchi and Gábor Lugosi, "On prediction of individual sequences", Annals of Statistics 27 (1999): 1865--1895
Marie Devaine, Pierre Gaillard, Yannig Goude, Gilles Stoltz, "Forecasting electricity consumption by aggregating specialized experts", arxiv:1207.1965
Eric C. Hall, Rebecca M. Willett
- "Online Optimization in Dynamic Environments", arxiv:1307.5944
- "Dynamical Models and Tracking Regret in Online Convex Programming", arxiv:1301.1254
Joseph Y. Halpern, Rafael Pass, "Iterated Regret Minimization: A More Realistic Solution Concept", arxiv:0810.3023
Mark Herbster and Manfred K. Warmuth, "Tracking the Best Expert", Machine Learning 32 (1998): 151--178 [PS version via Dr. Herbster]
Elad Hazan and Satyen Kale, "Extracting certainty from uncertainty: regret bounded by variation in costs", Machine Learning 80 (2010): 165--188
Claire Monteleoni and Tommi S. Jaakkola, "Online Learning of Non-stationary Sequences", pp. 1093--1100 in NIPS 2003 (vol. 16) [Figuring out at what rate to switch between experts]
Claire Monteleoni, Gavin Schmidt, Shailesh Saroha and Eva Asplund, "Tracking Climate Models", Statistical Analysis and Data Mining 4 (2011): 372--392 [PDF reprpint via Prof. Monteleoni]
Maxim Raginsky, Roummel F. Marcia, Jorge Silva and Rebecca M. Willett, "Sequential Probability Assignment via Online Convex Programming Using Exponential Families" [ISIT 2009; PDF]
Maxim Raginsky, Rebecca M. Willett, C. Horn, Jorge Silva and Roummel F. Marcia, "Sequential anomaly detection in the presence of noise and limited feedback", IEEE Transactions on Information Theory 58 (2012): 5544--5562, arxiv:0911.2904
Alexander Rakhlin, Karthik Sridharan, Ambuj Tewari
- "Online Learning; Beyond Regret", arxiv:1011.3168
- "Online Learning: Random Averages, Combinatorial Parameters, and Learnability", arxiv:1006.1138
Tom F. Sterkenburg, "Solomonoff Prediction and Occam's Razor", Philosophy of Science 83 (2016): 459--479, phil-sci/12429

CRS, the lectures on low-regret learning in my course on Conceptual Foundations of Statistical Learning (i.e., learning theory for undergraduate statisticians), which were lectures 23 and 24 in the latest iteration (2021). (The lecture notes refer to homework problems which you can find in the same place.)
CRS, Abigail Z. Jacobs, Kristina Lisa Klinkner and Aaron Clauset, "Adapting to Non-stationarity with Growing Expert Ensembles", arxiv:1103.0949

Jacob Abernethy, Alekh Agarwal, Peter L. Bartlett, Alexander Rakhlin, "A Stochastic View of Optimal Regret through Minimax Duality", arxiv:0903.5328
Oren Anava, Elad Hazan, Shie Mannor, Ohad Shamir, "Online Learning for Time Series Prediction", arxiv:1302.6927
O. Besbes, Y. Gur, A. Zeevi, "Non-stationary Stochastic Optimization", arxiv:1307.5449
Sébastien Bubeck, Nicolò Cesa-Bianchi, Sham M. Kakade, "Towards minimax policies for online linear optimization with bandit feedback", arxiv:1202.3079
Nicolò Cesa-Bianchi, Pierre Gaillard, Gábor Lugosi, Gilles Stoltz, "A new look at shifting regret", arxiv:1202.3323
Nicolò Cesa-Bianchi, Shai Shalev-Shwartz and Ohad Shamir, "Online Learning of Noisy Data with Kernels", arxiv:1005.2296
Alexey Chernov and Fedor Zhdanov, "Prediction with Expert Advice under Discounted Loss", arxiv:1005.1918
Alexey Chernov, Vladimir Vovk, "Prediction with Advice of Unknown Number of Experts", arxiv:1006.0475
Steven de Rooij, Tim van Erven, Peter D. Grünwald, Wouter M. Koolen, "Follow the Leader If You Can, Hedge If You Must", arxiv:1301.0534
Luc Devroye, Gábor Lugosi, Gergely Neu, "Prediction by Random-Walk Perturbation", arxiv:1302.5797
Elad Eban, Aharon Birnbaum, Shai Shalev-Shwartz, Amir Globerson, "Learning the Experts for Online Sequence Prediction", arxiv:1206.4604
Sébastien Gerchinovitz, "Sparsity Regret Bounds for Individual Sequences in Online Linear Regression", Journal of Machine Learning Research 14 (2013): 729--769
Eyal Gofer, Nicolò Cesa-Bianchi, Claudio Gentile, Yishay Mansour, "Regret Minimization for Branching Experts", COLT 2013 / Journal of Machine Learning Research Workshops and Conference Porceedings 30 (2013): 618--638
Nick Gravin, Yuval Peres, Balasubramanian Sivan, "Towards Optimal Algorithms for Prediction with Expert Advice", arxiv:1409.3040
Shantanu Gupta, Zachary C. Lipton, David Childers, "Efficient Online Estimation of Causal Effects by Deciding What to Observe", arxiv:2108.09265
András Gyorgy, Tamás Linder, Gábor Lugosi, "Efficient Tracking of Large Classes of Experts", arxiv:1110.2755
Joseph Y. Halpern, Samantha Leung, "Minimizing regret in dynamic decision problems", Theory and Decision 81 (2016): 123--151
Wei Han, Alexander Rakhlin, Karthik Sridharan, "Competing With Strategies", arxiv:1302.2672
Elad Hazan, Satyen Kale, "Projection-free Online Learning", arxiv:1206.4657
Joao P. Hespanha, Daniel Liberzon and A. Stephen Morse, "Hysteresis-based switching algorithms for supervisory control of uncertain systems", Automatica 39 (2003): 263--272
Yu-Guan Hsieh, Franck Iutzeler, Jerome Malick, Panayotis Mertikopoulos, "Multi-Agent Online Optimization with Delays: Asynchronicity, Adaptivity, and Optimism", Journal of Machine Learning Research 23 (2022): 78
Purushottam Kar, Bharath K Sriperumbudur, Prateek Jain, Harish C Karnick, "On the Generalization Ability of Online Learning Algorithms for Pairwise Loss Functions", arxiv:1305.2505
Haipeng Luo, Robert E. Schapire, "Towards Minimax Online Learning with Unknown Time Horizon", arxiv:1307.8187
Shie Mannor, Ohad Shamir, "From Bandits to Experts: On the Value of Side-Observations", arxiv:1106.2436
E. Meron and M. Feder, "Finite-Memory Universal Prediction of Individual Sequences", IEEE Transactions on Information Theory 50 (2004): 1506--1523
Georgy Noarov, Ramya Ramalingam, Aaron Roth, Stephan Xie, "High-Dimensional Prediction for Sequential Decision Making", arxiv:2310.17651 [Framed as an alternative to conformal prediction, rather than an instance of it...]
Francesco Orabona, Nicolò Cesa-Bianchi and Claudio Gentile, "Beyond Logarithmic Bounds in Online Learning" [PDF]
Maxim Raginsky, Angelia Nedić, "Online discrete optimization in social networks in the presence of Knightian uncertainty", arxiv:1307.0473
Alexander Rakhlin and Karthik Sridharan
- "Statistical Learning Theory and Sequential Prediction" [PDF lecture notes]
- "Online Learning with Predictable Sequences", arxiv:1208.3728
Stephane Ross, J. Andrew Bagnell, "Stability Conditions for Online Learnability", arxiv:1108.3154
Ankan Saha and Ambuj Tewari, "Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback", AI Stats 2011
Yevgeny Seldin, Peter Bartlett, Koby Crammer, "Advice-Efficient Prediction with Expert Advice", arxiv:1304.3708
Shai Shalev-Shwartz, "Online Learning and Online Convex Optimization", Foundations and Trends in Machine Learning 4 (2012): 107--194, PDF preprint
Shahin Shahrampour, Alexander Rakhlin, Ali Jadbabaie, "Online Learning of Dynamic Parameters in Social Networks", arxiv:1310.0432
Nathan Srebro, Karthik Sridharan, Ambuj Tewari, "On the Universality of Online Mirror Descent", arxiv:1107.4080
Nina Vaits, Edward Moroshko, Koby Crammer, "Second-Order Non-Stationary Online Learning for Regression", arxiv:1303.0140
Tim van Erven, Peter Grünwald, Wouter M. Koolen, Steven de Rooij, "Adaptive Hedge", arxiv:1110.6416
Vladimir V. V'yugin, "Online Learning in Case of Unbounded Losses Using the Follow Perturbed Leader Algorithm", Journal of Machine Learning Research 12 (2011): 241--266, arxiv:1008.4232
T. Weissman, "How to Filter an `Individual Sequence with Feedback'", IEEE Transactions on Information Theory 54 (2008): 3831--3841
Tianbao Yang, Rong Jin, Mehrdad Mahdavi, "Regret Bound by Variation for Online Convex Optimization", Machine Learning 95 (2014): 183--223, arxiv:1111.6337
Jacob Ziv, Neri Merhva, "On context-tree prediction of individual sequences", arxiv:cs/0508127

Co-conspirators to be named later + CRS, "Regret Minimization and Knightian Uncertainty"