Do LLMs still use backprop?

brcmthrowaway • yesterday at 8:45 AM • 3 replies • view on HN

Replies

Yes. Pretraining and fine-tuning use standard Adam optimizers (usually with weight-decay). Reinforcement learning has been the odd-man out historically, but these days almost all RL algorithms also use backprop and gradient descent.

ForceBru • yesterday at 9:03 AM

Are LLMs still trained by (variants of) stochastic GRADIENT descent? AFAIK what used to be called "backprop" is nowadays known as "automatic differentiation". It's widely used in PyTorch, JAX etc

➕ show 1 reply

alt Hacker News

Replies