Gradient Descent Exponent
Description of constant
Let $f$ be a convex function with $1$-Lipschitz gradient. We assume black-box access to the function and its gradient. Gradient descent will converge to a global minimum with an appropriate choice of step size $s$: $x_{k+1} := x_k - s\cdot \nabla f(x_k)$. In general, $s$ can be chosen to vary with the step $k$. The function gap $f(x_k) - \inf f$, will then converge like $O(k^{-c})$ for some exponent $c$ where $k$ is the iteration counter. While fixed step sizes achieve $c=1$, time-varying patterns of increasing and decreasing step sizes can lead to improved rates of convergence, increasing the exponent.
The constant $C_{35}$ is the largest exponent $c$ such that there exists a step schedule $s : \mathbb{N} \to \mathbb{R}$ so that vanilla gradient descent has a worst-case convergence of $O(k^{-c})$.
Known upper bounds
| Bound | Reference | Comments |
|---|---|---|
| 2 | Folklore | See, e.g., [N2014] |
Known lower bounds
| Bound | Reference | Comments |
|---|---|---|
| 1 | Folklore | Achieved by constant step sizes. See, e.g., [B2015]. |
| 1.0564 | [GSW23] | Obtained by nonconstant, nonperiodic stepsize schedule. Appeared on arXiv concurrently with [AP24] |
| $\log_2(1+\sqrt{2}) \approx 1.271$ | [AP24] | Obtained by the “silver stepsize schedule”. Conjecturally optimal. Appeared on arXiv concurrently with [GSW23]. |
References
- [AP24] Jason M Altschuler and Pablo A Parrilo. Acceleration by stepsize hedging: Silver stepsize schedule for smooth convex optimization. Mathematical Programming, pages 1–14, 2024.
- [B2015] Dimitri P. Bertsekas. Convex optimization algorithms. 2015.
- [GSW23] Benjamin Grimmer, Kevin Shu, and Alex L. Wang. Accelerated Gradient Descent via Long Steps. arXiv:2309.09961
- [N2014] Yurii Nesterov. Introductory Lectures on Convex Optimization: A Basic Course. Springer Publishing Company, Incorporated, 1 edition, 2014.