logoalt Hacker News

aurareturnyesterday at 11:25 AM3 repliesview on HN

It uses 10 chips for 8B model. It’d need 80 chips for an 80b model.

Each chip is the size of an H100.

So 80 H100 to run at this speed. Can’t change the model after you manufacture the chips since it’s etched into silicon.


Replies

9cb14c1ec0yesterday at 12:33 PM

As many others in this conversation have asked, can we have some sources on the idea that the model is spread across chips? You keep making the claim, but no one (myself included) else has any idea where that information comes from or if it is correct.

show 1 reply
grzraczyesterday at 11:32 AM

I'm sure there is plenty of optimization paths left for them if they're a startup. And imho smaller models will keep getting better. And a great business model for people having to buy your chips for each new LLM release :)

show 1 reply
ubercoreyesterday at 11:45 AM

Do we know that it needs 10 chips to run the model? Or are the servers for the API and chatbot just specced with 10 boards to distribute user load?

show 1 reply