logoalt Hacker News

jephstoday at 12:27 AM1 replyview on HN

I'm terribly sorry, but scaling curves or GTFO. Any random pile of linear algebra works fine-ish at small scales. Very few random piles of linear algebra push the Pareto envelope at large scales.


Replies

ketchup32613today at 1:15 AM

Do you want to see scaling curves wrt data and param size? I agree that 1.2B and 10B tokens is not representative, but what scale of parameters and dataset sizes would be convincing?

show 1 reply