You'd be surprised how quickly improvement of autoregressive language models levels off with ep...

_0ffh • today at 12:51 AM • 1 reply • view on HN

You'd be surprised how quickly improvement of autoregressive language models levels off with epoch count (though, admittedly, one epoch is a LOT). Diffusion language models otoh indeed keep profiting for much longer, fwiw.

Replies

zozbot234 • today at 9:33 AM

Does this also apply to LLM training at scale? I would be a bit surprised if it does, fwiw.

alt Hacker News

Replies