They will. Either new architectures will come out that give us greater efficiency, or we will hit a ...

vessenes • yesterday at 5:00 PM • 1 reply • view on HN

They will. Either new architectures will come out that give us greater efficiency, or we will hit a point where the main thing we can do is shove more training time onto these weights to get more per byte. Similar thing is already happening organically when it comes to efficient token use; see for instance https://github.com/qlabs-eng/slowrun.

Replies

simopa • yesterday at 5:20 PM

Thanks for the link.

alt Hacker News

Replies