Gradient Descent Exponent

Description of constant

Let $f$ be a $C^1$ convex function with $L$-Lipschitz gradient. We assume black-box access to the function and its gradient. Gradient descent will converge to a global minimum with an appropriate choice of step size $s$: $x_{k+1} := x_k - s\cdot \nabla f(x_k)$. In general, $s$ can be chosen to vary with the step $k$. The function gap $f(x_k) - \inf f$, will then converge like $O(k^{-c})$ for some exponent $c$ where $k$ is the iteration counter. While fixed step sizes achieve $c=1$, complicated patterns of increasing and decreasing step sizes can lead to improved rates of convergence, increasing the exponent.

The constant $C_{35}$ is the supremum of exponents $c$ such that there exists a step schedule $s : \mathbb{N} \to \mathbb{R}$ so that vanilla gradient descent has a worst-case convergence of $O(k^{-c})$.

Known upper bounds

Bound Reference Comments
2 Folklore See, e.g., [N2014]

Known lower bounds

Bound Reference Comments
1 Folklore Achieved by constant step sizes. See, e.g., [B2015].
1.0564 [GSW23] Nonconstant, fractal pattern
1.178 [GPR23] Found by computer search on schedules of size 50
1.271 [AP24] $\log_2(1+\sqrt{2})$, the “Silver schedule”

References