I'm terribly sorry, but scaling curves or GTFO. Any random pile of linear algebra works fine-is...

jephs • today at 12:27 AM • 1 reply • view on HN

I'm terribly sorry, but scaling curves or GTFO. Any random pile of linear algebra works fine-ish at small scales. Very few random piles of linear algebra push the Pareto envelope at large scales.

Replies

ketchup32613 • today at 1:15 AM

Do you want to see scaling curves wrt data and param size? I agree that 1.2B and 10B tokens is not representative, but what scale of parameters and dataset sizes would be convincing?

➕ show 1 reply

alt Hacker News

Replies