SurpriseOpt
We introduce SurpriseOpt, an optimization framework that replaces constant exponential decay with state-dependent adaptive interpolation. The algorithm detects “surprises” as ratios of gradient magnitude and second-moment magnitude and modulates the effective inertia of first and second moments via adaptive gating functions. SurpriseOpt furthermore features a mechanism to escape plateaus in the loss function landscape: It accumulates information about recent low surprises as “boredom” and adapts the learning rate accordingly. This boredom feature can be added to any first-order optimizer. We demonstrate that SurpriseOpt can converge several times faster than Adam across various tasks.
Description
This repository provides the Hugging Face entry for the research paper "SurpriseOpt: An Adaptive First-Order Optimizer Driven by Boredom".
The scientific paper is published as preprint at 10.5281/zenodo.20060806
The source code for both Julia and PyTorch is hosted on Codeberg: https://codeberg.org/Soloof/SurpriseOpt
It contains a reference implementation of the algorithm to use it directly in Julia via Flux or Lux or in Python via PyTorch.
How to test the optimizer
Please follow the description at the project repository.