I’ve read and heard from Semi Analysis and other best-in-class analysts that the amount of software ...

aerhardt • last Friday at 9:12 PM • 1 reply • view on HN

I’ve read and heard from Semi Analysis and other best-in-class analysts that the amount of software optimizations possible up and down the stack is staggering…

How do you explain that capabilities being equal, the cost per token is going down dramatically?

Replies

bcjdjsndon • yesterday at 9:13 PM

Optimizations, like I said. They'll never hack away the massive memory requirements however, or the pre training... Imagine the memory requirements without the pre training step....this is just part and parcel of the transformer architecture.

➕ show 1 reply

alt Hacker News

Replies