Gradient Descent Exponent

Description of constant

Let $f$ be a $C^1$ convex function with $L$-Lipschitz gradient. We assume black-box access to the function and its gradient. Gradient descent will converge to a global minimum with an appropriate choice of step size $s$: $x_{k+1} := x_k - s\cdot \nabla f(x_k)$. In general, $s$ can be chosen to vary with the step $k$. The function gap $f(x_k) - \inf f$, will then converge like $O(k^{-c})$ for some exponent $c$ where $k$ is the iteration counter. While fixed step sizes achieve $c=1$, complicated patterns of increasing and decreasing step sizes can lead to improved rates of convergence, increasing the exponent.

The constant $C_{35}$ is the supremum of exponents $c$ such that there exists a step schedule $s : \mathbb{N} \to \mathbb{R}$ so that vanilla gradient descent has a worst-case convergence of $O(k^{-c})$.

Known upper bounds

Bound	Reference	Comments
2	Folklore	See, e.g., [N2014]

Known lower bounds

Bound	Reference	Comments
1	Folklore	Achieved by constant step sizes. See, e.g., [B2015].
1.0564	[GSW23]	Nonconstant, fractal pattern
1.178	[GPR23]	Found by computer search on schedules of size 50
1.271	[AP24]	$\log_2(1+\sqrt{2})$, the “Silver schedule”

References

[AP24] Jason M Altschuler and Pablo A Parrilo. Acceleration by stepsize hedging: Silver stepsize schedule for smooth convex optimization. Mathematical Programming, pages 1–14, 2024.
[B2015] Dimitri P. Bertsekas. Convex optimization algorithms. 2015.
[GPR23] Shuvomoy Das Gupta, Bart P.G. Van Parys, and Ernest Ryu. Branch-and-bound performance estimation programming: A unified methodology for constructing optimal optimization methods. Mathematical Programming, 2023
[GSW23] Benjamin Grimmer, Kevin Shu, and Alex L. Wang. Accelerated Gradient Descent via Long Steps. arXiv:2309.09961
[N2014] Yurii Nesterov. Introductory Lectures on Convex Optimization: A Basic Course. Springer Publishing Company, Incorporated, 1 edition, 2014.