Stochastic Approximation Algorithms

Last update: 05 Sep 2025 09:29
First version: 2 September 2007

Logically, "stochastic approximation" could refer to a great range of things, but in practice it has become something of a technical term for procedures that approximate the solution of an equation, observed through noise, or which try to minimize a function, again observed through noise. The former --- root-finding --- is sometimes known as the Robbins-Monro problem, and the latter --- minimization --- as the Kiefer-Wolfowitz problem. (Tangentially, that Wolfowitz is the father of the Wolfowitz who helped drag us into the invasion of Iraq.) This turns out to be a subject with deep connections to the theory of on-line learning algorithms and recursive estimation, which is really how I became interested in it, but I also like it nowadays because it provides some very cute, yet powerful, probability examples...

Michel Benaïm, "Dynamics of stochastic approximation algorithms", Séminaire de probabilités (Strasbourg) 33 (1999): 1--68 [Link to full text, bibliography, etc.]
Tze Leung Lai, "Stochastic Approximation", Annals of Statistics 31 (2003): 391--406
M. B. Nevel'son and R. Z. Hasminskii, Stochastic Approximation and Recursive Estimation [A truly excellent book; I don't suppose anyone has a copy they'd like to sell?]
Robin Pemantle, "A Survey of Random Processes with Reinforcement", math.PR/0610076

H. Robbins and S. Monro, "A Stochastic Approximation Method", Annals of Mathematical Statistics 22 (1951): 400--407
J. Kiefer and J. Wolfowitz, "Stochastic Estimation of the Maximum of a Regression Function", Annals of Mathematical Statistics 23 (1952): 462--466

Arthur E. Albert and Leland A. Gardner, Jr., Stochastic Approximation and Nonlinear Regression
Léon Bottou and Olivier Bosquet, "The Tradeoffs of Large Scale Learning", pp. 351--368 of Suvrit Sra, Sebastian Nowozin and Stephen J. Wright (eds.), Optimization for Machine Learning [PDF reprint via Dr. Bottou]
Abdelkader Mokkadem, Mariane Pelletier, Yousri Slaoui
- "The stochastic approximation method for the estimation of a multivariate probability density", arxiv:0807.2960
- "Revisiting Révész's stochastic approximation method for the estimation of a regression function", arxiv:0812.3973

Francis Bach and Eric Moulines, "Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n)", arxiv:1306.2119
Johanna Bertl, Gregory Ewing, Carolin Kosiol, Andreas Futschik, "Approximate Maximum Likelihood Estimation", arxiv:1507.04553
Vivek S. Borkar, Stochastic Approxmation: A Dynamical Systems Viewpoint
D. L. Burkholder, "On a Class of Stochastic Approximation Processes", Annals of Mathematical Statistics 27 (1956): 1044--1059
Damek Davis, Dmitriy Drusvyatskiy, Liwei Jiang, "Asymptotic normality and optimality in nonsmooth stochastic approximation", arxiv:2301.06632
Sebastien Gadat and Laurent Younnes, "A Stochastic Algorithm for Feature Selection in Pattern Recognition", Journal of Machine Learning Research 8 (2007): 509--547
Guillaume Garrigos, Robert M. Gower, "Handbook of Convergence Theorems for (Stochastic) Gradient Methods", arxiv:2301.11235
Tamir Hazan, George Papandreou, and Daniel Tarlow (eds.), Perturbations, Optimization, and Statistics
Sameer Kamal, "On the convergence, lock-in probability and sample complexity of stochastic approximation", arxiv:1007.4684
H. J. Kushner and G. G. Yin, Stochastic Approximation and Recursive Algorithms and Applications
Sophie Laruelle and Gilles Pages, "Stochastic Approximation with Averaging Innovation", arxiv:10067.3578
Jonas Latz, "Analysis of Stochastic Gradient Descent in Continuous Time", Statistics and Computing 31 (2021): 39, arxiv:2004.07177
Abdelkader Mokkadem and Mariane Pelletier
- "Convergence rate and averaging of nonlinear two-time-scale stochastic approximation algorithms", math.PR/0610329 = Annals of Applied Probability 16 (2006): 1671--1702
- "A companion for the Kiefer-Wolfowitz-Blum stochastic approximation algorithm", math.ST/0610487 [Simultaneously estimating both the location and the magnitude of the maximum]
Deanna Needell, Nathan Srebro, Rachel Ward, "Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm", arxiv:1310.5715
Mariane Pelletier, "Weak Convergence Rates for Stochastic Approximation with Applications to Multiple Targets and Simulated Annealing", Annals of Applied Probability 8 (1998): 10--44
David Saad (ed.), Online Learning of Neural Networks
Panos Toulis, Thibaut Horel and Edoardo M. Airoldi, "The proximal Robbins–Monro method", Journal of the Royal Statistical Society B 83 (2021): 188--212