HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Journal articles

Second-order step-size tuning of SGD for non-convex optimization

Abstract : In view of a direct and simple improvement of vanilla SGD, this paper presents a fine-tuning of its step-sizes in the mini-batch case. For doing so, one estimates curvature, based on a local quadratic model and using only noisy gradient approximations. One obtains a new stochastic first-order method (Step-Tuned SGD), enhanced by second-order information, which can be seen as a stochastic version of the classical Barzilai-Borwein method. Our theoretical results ensure almost sure convergence to the critical set and we provide convergence rates. Experiments on deep residual network training illustrate the favorable properties of our approach. For such networks we observe, during training, both a sudden drop of the loss and an improvement of test accuracy at medium stages, yielding better results than SGD, RMSprop, or ADAM.
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03161775
Contributor : Camille Castera Connect in order to contact the contributor
Submitted on : Tuesday, November 23, 2021 - 9:09:21 AM
Last modification on : Wednesday, March 23, 2022 - 3:47:31 AM

File

2103.03570v2.pdf
Files produced by the author(s)

Identifiers

Citation

Camille Castera, Cédric Févotte, Jérôme Bolte, Edouard Pauwels. Second-order step-size tuning of SGD for non-convex optimization. Neural Processing Letters, Springer Verlag, 2022, pp.1--26. ⟨10.1007/s11063-021-10705-5⟩. ⟨hal-03161775v2⟩

Share

Metrics

Record views

129

Files downloads

90