December 7, 2023

Revisiting Dynamic Evaluation:Online Adaptation for LLMs

Abstract

We revisit dynamic evaluation, the idea of online adapting the parameters of a language model with gradient descent on a given sequence of test tokens. While it is generally known that adapting the parameters at test-time improves the overall predictive performance, we pay particular attention to the speed of adaptation (in terms of sample efficiency) and computational overhead for performing gradient computation and parameter updates.

Authors

Amal Rannen-Triki, Jörg Bornschein, Alexandre Galashov, Razvan Pascanu, Michalis Titsias, Marcus Hutter, Andras Gyorgy, Yee Whye Teh

Venue

DistShift-NeurIPS 23