logoalt Hacker News

_0ffhtoday at 12:51 AM1 replyview on HN

You'd be surprised how quickly improvement of autoregressive language models levels off with epoch count (though, admittedly, one epoch is a LOT). Diffusion language models otoh indeed keep profiting for much longer, fwiw.


Replies

zozbot234today at 9:33 AM

Does this also apply to LLM training at scale? I would be a bit surprised if it does, fwiw.