Stochastic Approximation Algorithms
Last update: 05 Sep 2025 09:29First version: 2 September 2007
Logically, "stochastic approximation" could refer to a great range of things, but in practice it has become something of a technical term for procedures that approximate the solution of an equation, observed through noise, or which try to minimize a function, again observed through noise. The former --- root-finding --- is sometimes known as the Robbins-Monro problem, and the latter --- minimization --- as the Kiefer-Wolfowitz problem. (Tangentially, that Wolfowitz is the father of the Wolfowitz who helped drag us into the invasion of Iraq.) This turns out to be a subject with deep connections to the theory of on-line learning algorithms and recursive estimation, which is really how I became interested in it, but I also like it nowadays because it provides some very cute, yet powerful, probability examples...
- Recommended, big picture:
- Michel Benaïm, "Dynamics of stochastic approximation algorithms", Séminaire de probabilités (Strasbourg) 33 (1999): 1--68 [Link to full text, bibliography, etc.]
- Tze Leung Lai, "Stochastic Approximation", Annals of Statistics 31 (2003): 391--406
- M. B. Nevel'son and R. Z. Hasminskii, Stochastic Approximation and Recursive Estimation [A truly excellent book; I don't suppose anyone has a copy they'd like to sell?]
- Robin Pemantle, "A Survey of Random Processes with Reinforcement", math.PR/0610076
- Recommended, the origins:
- H. Robbins and S. Monro, "A Stochastic Approximation Method", Annals of Mathematical Statistics 22 (1951): 400--407
- J. Kiefer and J. Wolfowitz, "Stochastic Estimation of the Maximum of a Regression Function", Annals of Mathematical Statistics 23 (1952): 462--466
- Recommended, close-ups:
- Arthur E. Albert and Leland A. Gardner, Jr., Stochastic Approximation and Nonlinear Regression
- Léon Bottou and Olivier Bosquet, "The Tradeoffs of Large Scale Learning", pp. 351--368 of Suvrit Sra, Sebastian Nowozin and Stephen J. Wright (eds.), Optimization for Machine Learning [PDF reprint via Dr. Bottou]
- Abdelkader Mokkadem, Mariane Pelletier, Yousri Slaoui
- "The stochastic approximation method for the estimation of a multivariate probability density", arxiv:0807.2960
- "Revisiting Révész's stochastic approximation method for the estimation of a regression function", arxiv:0812.3973
- To read:
- Francis Bach and Eric Moulines, "Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n)", arxiv:1306.2119
- Johanna Bertl, Gregory Ewing, Carolin Kosiol, Andreas Futschik, "Approximate Maximum Likelihood Estimation", arxiv:1507.04553
- Vivek S. Borkar, Stochastic Approxmation: A Dynamical Systems Viewpoint
- D. L. Burkholder, "On a Class of Stochastic Approximation Processes", Annals of Mathematical Statistics 27 (1956): 1044--1059
- Damek Davis, Dmitriy Drusvyatskiy, Liwei Jiang, "Asymptotic normality and optimality in nonsmooth stochastic approximation", arxiv:2301.06632
- Sebastien Gadat and Laurent Younnes, "A Stochastic Algorithm for Feature Selection in Pattern Recognition", Journal of Machine Learning Research 8 (2007): 509--547
- Guillaume Garrigos, Robert M. Gower, "Handbook of Convergence Theorems for (Stochastic) Gradient Methods", arxiv:2301.11235
- Tamir Hazan, George Papandreou, and Daniel Tarlow (eds.), Perturbations, Optimization, and Statistics
- Sameer Kamal, "On the convergence, lock-in probability and sample complexity of stochastic approximation", arxiv:1007.4684
- H. J. Kushner and G. G. Yin, Stochastic Approximation and Recursive Algorithms and Applications
- Sophie Laruelle and Gilles Pages, "Stochastic Approximation with Averaging Innovation", arxiv:10067.3578
- Jonas Latz, "Analysis of Stochastic Gradient Descent in Continuous Time", Statistics and Computing 31 (2021): 39, arxiv:2004.07177
- Abdelkader Mokkadem and Mariane Pelletier
- "Convergence rate and averaging of nonlinear two-time-scale stochastic approximation algorithms", math.PR/0610329 = Annals of Applied Probability 16 (2006): 1671--1702
- "A companion for the Kiefer-Wolfowitz-Blum stochastic approximation algorithm", math.ST/0610487 [Simultaneously estimating both the location and the magnitude of the maximum]
- Deanna Needell, Nathan Srebro, Rachel Ward, "Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm", arxiv:1310.5715
- Mariane Pelletier, "Weak Convergence Rates for Stochastic Approximation with Applications to Multiple Targets and Simulated Annealing", Annals of Applied Probability 8 (1998): 10--44
- David Saad (ed.), Online Learning of Neural Networks
- Panos Toulis, Thibaut Horel and Edoardo M. Airoldi, "The proximal Robbins–Monro method", Journal of the Royal Statistical Society B 83 (2021): 188--212