Low-Regret Learning
Last update: 07 Dec 2024 23:42First version: 12 May 2012
Yet Another Inadequate Placeholder.
A.k.a. "prediction with expert advice", "individual-sequence prediction", "on-line learning". (That last name bugs me. Strictly speaking, one could have an on-line learning algorithm which didn't aim at low regret but at, say, low risk, and indeed the old uses of stochastic optimization for regression aimed implicitly at just that.)
- See also:
- Ensemble Methods in Machine Learning
- Forecasting Non-Stationary Processes
- Learning Theory
- Universal Prediction
- Recommended, big picture:
- Nicolò Cesa-Bianchi and Gábor Lugosi, Prediction, Learning, and Games [Mini-review]
- J. Michael Steele, The Scary Sequences Project [Notes and references on individual-sequence prediction]
- Recommended, details:
- Alekh Agarwal and John C. Duchi, "The Generalization Ability of Online Algorithms for Dependent Data", arxiv:1110.2529 [If you have a learning algorithm which achieves low regret against some class of alternatives, and the data source is suitably strongly-mixing, your algorithm will generalize almost as well as the best rival.]
- Nicolò Cesa-Bianchi, A. Conconi and C. Gentile, "On the Generalization Ability of On-Line Learning Algorithms", IEEE Transactions on Information Theory 50 (2004): 2050--2057
- Nicolò Cesa-Bianchi, Pierre Gaillard, Gábor Lugosi and Gilles Stoltz, "A New Look at Shifting Regret", arxiv:1202.3323
- Nicolò Cesa-Bianchi and Gábor Lugosi, "On prediction of individual sequences", Annals of Statistics 27 (1999): 1865--1895
- Marie Devaine, Pierre Gaillard, Yannig Goude, Gilles Stoltz, "Forecasting electricity consumption by aggregating specialized experts", arxiv:1207.1965
- Eric C. Hall, Rebecca M. Willett
- "Online Optimization in Dynamic Environments", arxiv:1307.5944
- "Dynamical Models and Tracking Regret in Online Convex Programming", arxiv:1301.1254
- Joseph Y. Halpern, Rafael Pass, "Iterated Regret Minimization: A More Realistic Solution Concept", arxiv:0810.3023
- Mark Herbster and Manfred K. Warmuth, "Tracking the Best Expert", Machine Learning 32 (1998): 151--178 [PS version via Dr. Herbster]
- Elad Hazan and Satyen Kale, "Extracting certainty from uncertainty: regret bounded by variation in costs", Machine Learning 80 (2010): 165--188
- Claire Monteleoni and Tommi S. Jaakkola, "Online Learning of Non-stationary Sequences", pp. 1093--1100 in NIPS 2003 (vol. 16) [Figuring out at what rate to switch between experts]
- Claire Monteleoni, Gavin Schmidt, Shailesh Saroha and Eva Asplund, "Tracking Climate Models", Statistical Analysis and Data Mining 4 (2011): 372--392 [PDF reprpint via Prof. Monteleoni]
- Maxim Raginsky, Roummel F. Marcia, Jorge Silva and Rebecca M. Willett, "Sequential Probability Assignment via Online Convex Programming Using Exponential Families" [ISIT 2009; PDF]
- Maxim Raginsky, Rebecca M. Willett, C. Horn, Jorge Silva and Roummel F. Marcia, "Sequential anomaly detection in the presence of noise and limited feedback", IEEE Transactions on Information Theory 58 (2012): 5544--5562, arxiv:0911.2904
- Alexander Rakhlin, Karthik Sridharan, Ambuj Tewari
- "Online Learning; Beyond Regret", arxiv:1011.3168
- "Online Learning: Random Averages, Combinatorial Parameters, and Learnability", arxiv:1006.1138
- Tom F. Sterkenburg, "Solomonoff Prediction and Occam's Razor", Philosophy of Science 83 (2016): 459--479, phil-sci/12429
- Modesty forbids me to recommend:
- CRS, the lectures on low-regret learning in my course on Conceptual Foundations of Statistical Learning (i.e., learning theory for undergraduate statisticians), which were lectures 23 and 24 in the latest iteration (2021). (The lecture notes refer to homework problems which you can find in the same place.)
- CRS, Abigail Z. Jacobs, Kristina Lisa Klinkner and Aaron Clauset, "Adapting to Non-stationarity with Growing Expert Ensembles", arxiv:1103.0949
- To read:
- Jacob Abernethy, Alekh Agarwal, Peter L. Bartlett, Alexander Rakhlin, "A Stochastic View of Optimal Regret through Minimax Duality", arxiv:0903.5328
- Oren Anava, Elad Hazan, Shie Mannor, Ohad Shamir, "Online Learning for Time Series Prediction", arxiv:1302.6927
- O. Besbes, Y. Gur, A. Zeevi, "Non-stationary Stochastic Optimization", arxiv:1307.5449
- Sébastien Bubeck, Nicolò Cesa-Bianchi, Sham M. Kakade, "Towards minimax policies for online linear optimization with bandit feedback", arxiv:1202.3079
- Nicolò Cesa-Bianchi, Pierre Gaillard, Gábor Lugosi, Gilles Stoltz, "A new look at shifting regret", arxiv:1202.3323
- Nicolò Cesa-Bianchi, Shai Shalev-Shwartz and Ohad Shamir, "Online Learning of Noisy Data with Kernels", arxiv:1005.2296
- Alexey Chernov and Fedor Zhdanov, "Prediction with Expert Advice under Discounted Loss", arxiv:1005.1918
- Alexey Chernov, Vladimir Vovk, "Prediction with Advice of Unknown Number of Experts", arxiv:1006.0475
- Steven de Rooij, Tim van Erven, Peter D. Grünwald, Wouter M. Koolen, "Follow the Leader If You Can, Hedge If You Must", arxiv:1301.0534
- Luc Devroye, Gábor Lugosi, Gergely Neu, "Prediction by Random-Walk Perturbation", arxiv:1302.5797
- Elad Eban, Aharon Birnbaum, Shai Shalev-Shwartz, Amir Globerson, "Learning the Experts for Online Sequence Prediction", arxiv:1206.4604
- Sébastien Gerchinovitz, "Sparsity Regret Bounds for Individual Sequences in Online Linear Regression", Journal of Machine Learning Research 14 (2013): 729--769
- Eyal Gofer, Nicolò Cesa-Bianchi, Claudio Gentile, Yishay Mansour, "Regret Minimization for Branching Experts", COLT 2013 / Journal of Machine Learning Research Workshops and Conference Porceedings 30 (2013): 618--638
- Nick Gravin, Yuval Peres, Balasubramanian Sivan, "Towards Optimal Algorithms for Prediction with Expert Advice", arxiv:1409.3040
- Shantanu Gupta, Zachary C. Lipton, David Childers, "Efficient Online Estimation of Causal Effects by Deciding What to Observe", arxiv:2108.09265
- András Gyorgy, Tamás Linder, Gábor Lugosi, "Efficient Tracking of Large Classes of Experts", arxiv:1110.2755
- Joseph Y. Halpern, Samantha Leung, "Minimizing regret in dynamic decision problems", Theory and Decision 81 (2016): 123--151
- Wei Han, Alexander Rakhlin, Karthik Sridharan, "Competing With Strategies", arxiv:1302.2672
- Elad Hazan, Satyen Kale, "Projection-free Online Learning", arxiv:1206.4657
- Joao P. Hespanha, Daniel Liberzon and A. Stephen Morse, "Hysteresis-based switching algorithms for supervisory control of uncertain systems", Automatica 39 (2003): 263--272
- Yu-Guan Hsieh, Franck Iutzeler, Jerome Malick, Panayotis Mertikopoulos, "Multi-Agent Online Optimization with Delays: Asynchronicity, Adaptivity, and Optimism", Journal of Machine Learning Research 23 (2022): 78
- Purushottam Kar, Bharath K Sriperumbudur, Prateek Jain, Harish C Karnick, "On the Generalization Ability of Online Learning Algorithms for Pairwise Loss Functions", arxiv:1305.2505
- Haipeng Luo, Robert E. Schapire, "Towards Minimax Online Learning with Unknown Time Horizon", arxiv:1307.8187
- Shie Mannor, Ohad Shamir, "From Bandits to Experts: On the Value of Side-Observations", arxiv:1106.2436
- E. Meron and M. Feder, "Finite-Memory Universal Prediction of Individual Sequences", IEEE Transactions on Information Theory 50 (2004): 1506--1523
- Georgy Noarov, Ramya Ramalingam, Aaron Roth, Stephan Xie, "High-Dimensional Prediction for Sequential Decision Making", arxiv:2310.17651 [Framed as an alternative to conformal prediction, rather than an instance of it...]
- Francesco Orabona, Nicolò Cesa-Bianchi and Claudio Gentile, "Beyond Logarithmic Bounds in Online Learning" [PDF]
- Maxim Raginsky, Angelia Nedić, "Online discrete optimization in social networks in the presence of Knightian uncertainty", arxiv:1307.0473
- Alexander Rakhlin and Karthik Sridharan
- "Statistical Learning Theory and Sequential Prediction" [PDF lecture notes]
- "Online Learning with Predictable Sequences", arxiv:1208.3728
- Stephane Ross, J. Andrew Bagnell, "Stability Conditions for Online Learnability", arxiv:1108.3154
- Ankan Saha and Ambuj Tewari, "Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback", AI Stats 2011
- Yevgeny Seldin, Peter Bartlett, Koby Crammer, "Advice-Efficient Prediction with Expert Advice", arxiv:1304.3708
- Shai Shalev-Shwartz, "Online Learning and Online Convex Optimization", Foundations and Trends in Machine Learning 4 (2012): 107--194, PDF preprint
- Shahin Shahrampour, Alexander Rakhlin, Ali Jadbabaie, "Online Learning of Dynamic Parameters in Social Networks", arxiv:1310.0432
- Nathan Srebro, Karthik Sridharan, Ambuj Tewari, "On the Universality of Online Mirror Descent", arxiv:1107.4080
- Nina Vaits, Edward Moroshko, Koby Crammer, "Second-Order Non-Stationary Online Learning for Regression", arxiv:1303.0140
- Tim van Erven, Peter Grünwald, Wouter M. Koolen, Steven de Rooij, "Adaptive Hedge", arxiv:1110.6416
- Vladimir V. V'yugin, "Online Learning in Case of Unbounded Losses Using the Follow Perturbed Leader Algorithm", Journal of Machine Learning Research 12 (2011): 241--266, arxiv:1008.4232
- T. Weissman, "How to Filter an `Individual Sequence with Feedback'", IEEE Transactions on Information Theory 54 (2008): 3831--3841
- Tianbao Yang, Rong Jin, Mehrdad Mahdavi, "Regret Bound by Variation for Online Convex Optimization", Machine Learning 95 (2014): 183--223, arxiv:1111.6337
- Jacob Ziv, Neri Merhva, "On context-tree prediction of individual sequences", arxiv:cs/0508127
- To write:
- Co-conspirators to be named later + CRS, "Regret Minimization and Knightian Uncertainty"