logoalt Hacker News

credit_guylast Tuesday at 12:23 AM1 replyview on HN

That's unlikely. Cerebras doesn't speed up everything. Can it speed up everything? I don't know, I'm not an insider. But does it speed up everything? That is evidently not the case. Their page [1] lists only 4 production models and 2 preview models.

[1] https://inference-docs.cerebras.ai/models/overview


Replies

agentasticyesterday at 12:49 AM

They need to compile the model for their chips. Standard transformers are easier, so GPT-OSS, Qwen, GLM, etc if there is demand, they will deploy it.

Nemotron on the other hand is a hybrid (Transformer + Mamba-2) so it will be more challenging to compile it on Cerebras/Groq chips.

(Me thinks Nvidia is purposefully picking architecture+FP4 that is easy to ship on Nvidia chips, but harder for TPU or Cerebras/Groq to deploy)