It uses 10 chips for 8B model. It’d need 80 chips for an 80b model. Each chip is the size of an H1...

aurareturn • yesterday at 11:25 AM • 3 replies • view on HN

It uses 10 chips for 8B model. It’d need 80 chips for an 80b model.

Each chip is the size of an H100.

So 80 H100 to run at this speed. Can’t change the model after you manufacture the chips since it’s etched into silicon.

Replies

As many others in this conversation have asked, can we have some sources on the idea that the model is spread across chips? You keep making the claim, but no one (myself included) else has any idea where that information comes from or if it is correct.

➕ show 1 reply

grzracz • yesterday at 11:32 AM

I'm sure there is plenty of optimization paths left for them if they're a startup. And imho smaller models will keep getting better. And a great business model for people having to buy your chips for each new LLM release :)

➕ show 1 reply

ubercore • yesterday at 11:45 AM

Do we know that it needs 10 chips to run the model? Or are the servers for the API and chatbot just specced with 10 boards to distribute user load?

➕ show 1 reply

alt Hacker News

Replies