Gradient Descent Exponent

Description of constant

Let $f$ be a convex function with $1$-Lipschitz gradient. We assume black-box access to the function and its gradient. Gradient descent will converge to a global minimum with an appropriate choice of step size $s$: $x_{k+1} := x_k - s\cdot \nabla f(x_k)$. In general, $s$ can be chosen to vary with the step $k$. The function gap $f(x_k) - \inf f$, will then converge like $O(k^{-c})$ for some exponent $c$ where $k$ is the iteration counter. While fixed step sizes achieve $c=1$, time-varying patterns of increasing and decreasing step sizes can lead to improved rates of convergence, increasing the exponent.

The constant $C_{35}$ is the largest exponent $c$ such that there exists a step schedule $s : \mathbb{N} \to \mathbb{R}$ so that vanilla gradient descent has a worst-case convergence of $O(k^{-c})$.

Known upper bounds

Bound Reference Comments
2 Folklore See, e.g., [N2014]

Known lower bounds

Bound Reference Comments
1 Folklore Achieved by constant step sizes. See, e.g., [B2015].
1.0564 [GSW23] Obtained by nonconstant, nonperiodic stepsize schedule. Appeared on arXiv concurrently with [AP24]
$\log_2(1+\sqrt{2}) \approx 1.271$ [AP24] Obtained by the “silver stepsize schedule”. Conjecturally optimal. Appeared on arXiv concurrently with [GSW23].

References