Notebooks

## Sequential Decision-Making Under Stochastic Uncertainty

24 Sep 2019 00:39

That said... I'm interested in the theory of optimal decision-making, when you need to make multiple decisions over time, and there is non-trivial stochastic uncertainty, either because the effects of your actions are somewhat random, or because you can only coarsely and noisily measure the state of the system you're acting on. I am particularly interested in the extent to which optimal strategies can be learned, in the usual "probably approximately correct" sense of computational learning theory. Here it seems that there is a potentially very important difference between trying to learn an optimal strategy on the basis of merely historical, haphazard data, versus actually performing experiments. In fact, in some sense the best way to learn about the optimal policy may be to experiment with a totally random policy, because the data you gather from such an experiment is totally free of outside, confounding factors. (Similarly, one way to learn about the properties of a nonlinear system is to measure its response to white noise; this is the Wiener method for transducers.)

Finding the optimal strategy turns out to be a very hard problem, both computationally and statistically, and it seems staggeringly unlikely that most human beings, when faced with such situations, respond in anything like the optimal manner. (This is part of the reason things like DSGE models in macroeconomics are crazy.) Or, rather, if we do act optimally, it's with respect to a non-obvious criterion.

Related or subsidiary topics which will also show up here: Partially-observable Markov decision processes, reinforcement learning, etc., etc.

People sometimes distinguish between "risk", which can be represented stochastically, i.e., as a probability distribution, and "uncertainty", where there is simply no basis for assessing frequencies or the like. Decision-making under such strong or genuine uncertainty is a very different problem...

Recommended, big picture:
• David Blackwell and M. A. Girshick, Theory of Games and Statistical Decisions
• Bent Jesper Christensen and Nicholas M. Kiefer, Economic Modeling and Inference [Review: An Optimal Path to a Dead End.]
• Rich Sutton, Reinforcement Learning FAQ
• Richard S. Sutton and Andrew G. Barto, Reinforcement Learning [Book website, with draft text online]
Recommended, close-ups:
• Paul H. Algoet [Algoet did some brilliant stuff in the general area of information-theoretic approaches to stochastic processes in the early and mid 1990s, and then just stopped publishing. Whatever happened to him?]
• "Universal Schemes for Prediction, Gambling, and Portfolio Selection," Annals of Probability 20 (1992): 901--941 and an important Correction, 23 (1995): 474--478
• "The Strong Law of Large Numbers for Sequential Decisions Under Uncertainty," IEEE Transactions on Information Theory 40 (1994): 609--633
• Doron Blatt, Susan A. Murphy and Ji Zhu, "A-Learning for Approximate Planning", NIPS 2004 [PDF]
• Robert Kleinberg, Alexandru Niculescu-Mizil, Yogeshwer Sharma, "Regret Bounds for Sleeping Experts and Bandits", Machine Learning 80 (2010): 245--272
• Susan A. Murphy, "Optimal Dynamic Treatment Regimes", Journal of the Royal Statistical Society B 65 (2003): 331--366 [PDF]
• D. I. Simester, P. Sun and J. Tsitsiklis, "Dynamic Catalog Mailing Policies" [PDF]
• Dimitri P. Bertsekas and John Tsitsiklis, Neuro-Dynamic Programming
• Byron Boots, Geoffrey J. Gordon, "Predictive State Temporal Difference Learning", arxiv:1011.0041, NIPS 23 (2010)
• Sébastien Bubeck, Nicolò Cesa-Bianchi, Sham M. Kakade, "Towards minimax policies for online linear optimization with bandit feedback", arxiv:1202.3079
• Bibhas Chakraborty and Susan A. Murphy, "Dynamic Treatment Regimes", Annual Review of Statistics and Its Application 1 (2014): 447--464
• Nathaniel D. Daw, John P. O'Doherty, Peter Dayan, Ben Seymour and Raymond J. Dolan, "Cortical substrates for exploratory decisions in humans", Nature 441 (2006): 876--879 [Hey, sometimes people do act like reinforcement-learners!]
• A. Philip Dawid and Vanessa Didelez, "Identifying the consequences of dynamic treatment strategies: A decision-theoretic overview", Statistics Surveys 4 (2010): 184--231
• Eyal Even-Dar, Shie Mansour and Yishay Mansour, "Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems", Journal of Machine Learning Research 7 (2006): 1079--1105 [Yay for confidence intervals!]
• Milos Hauskrecht, "Value-Function Approximation for Partially Observable Markov Decision Processes", Journal of Artificial Intelligence Research 13 (2000): 33--94 [JAIR publishes the full text of articles online for free, but I feel insufficiently motivated to track down the link right now]
• F. Y. Hunt, "Sample path optimality for a Markov optimization problem", Stochastic Processes and their Applications 115 (2005): 769--779
• Leslie Pack Kaelbling, Michael L. Littman and Andrew W. Moore, "Reinforcement Learning: A Survey," Journal of Artificial Intelligence Research 4 (1996): 237--285
• Stephen Kelly and Malcolm I. Heywood, "Emergent Solutions to High-Dimensional Multitask Reinforcement Learning", Evolutionary Computation 26 (2018): 347--380
• Ivo Kwee, Marcus Hutter and Juergen Schmidhuber, "Market-Based Reinforcement Learning in Partially Observable Worlds," cs.AI/0105025
• Felix Leibfried, Sergio Pascual-Diaz, Jordi Grau-Moya, "A Unified Bellman Optimality Principle Combining Reward Maximization and Empowerment", arxiv:1907.12392
• Susan A. Murphy
• "A Generalization Error for Q-Learning" [PDF]
• "An Experimental Design for the Development of Adaptive Treatment Strategies", Statistics and Medcine forthcoming [PDF]
• Misha Perepelitsa, "A model of discrete choice based on reinforcement learning under short-term memory", arxiv:1908.06133
• Leonid Peshkin and Sayan Mukherjee, "Bounds on sample size for policy evaluation in Markov environments," cs.LG/0105027
• A. Potapov and M. K. Ali, "Convergence of reinforcement learning algorithms and acceleration of learning," Physical Review E 67 (2003): 026706
• Martin L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming
• Benjamin Recht, "A Tour of Reinforcement Learning: The View from Continuous Control", Annual Review of Control, Robotics, and Autonomous Systems 2 (2019): 253--279
• Philippe Rigollet and Assaf Zeevi, "Nonparametric Bandits with Covariates", arxiv:1003.1630
• Sayanti Roy, Emily Kieson, Charles Abramson, Christopher Crick, "Mutual Reinforcement Learning", arxiv:1907.06725
• Brian Sallans, Reinforcement Learning for Factored Markov Decision Processes [Abstract, download]
• R. L. Stratonovich, Conditional Markov Processes and Their Application to the Theory of Optimal Control
• Istvan Szita and Andras Lorincz, "The many faces of optimism", arxiv:0810.3451 ["exploration-exploitation dilemma ... of reinforcement learning. 'Optimism in the face of uncertainty' and model building play central roles in advanced exploration methods. ... a fast and simple algorithm ... finds a near-optimal policy in polynomial time... experimental evidence that it is robust and efficient..."]
• Alexander G. Tartakovsky, "Asymptotic Optimality of Certain Multihypothesis Sequential Tests: Non-i.i.d. Case", Statistical Inference for Stochastic Processes 1 (1998): 265--295
• John Tsitsiklis and collaborators, Papers on Neuro-Dynamic Programming
• Pascal Van Hentenryck and Russell Bent, Online Stochastic Combinatorial Optimization