That's unlikely. Cerebras doesn't speed up everything. Can it speed up everything? I don't know, I'm not an insider. But does it speed up everything? That is evidently not the case. Their page [1] lists only 4 production models and 2 preview models.
[1] https://inference-docs.cerebras.ai/models/overview
They need to compile the model for their chips. Standard transformers are easier, so GPT-OSS, Qwen, GLM, etc if there is demand, they will deploy it.
Nemotron on the other hand is a hybrid (Transformer + Mamba-2) so it will be more challenging to compile it on Cerebras/Groq chips.
(Me thinks Nvidia is purposefully picking architecture+FP4 that is easy to ship on Nvidia chips, but harder for TPU or Cerebras/Groq to deploy)
They need to compile the model for their chips. Standard transformers are easier, so GPT-OSS, Qwen, GLM, etc if there is demand, they will deploy it.
Nemotron on the other hand is a hybrid (Transformer + Mamba-2) so it will be more challenging to compile it on Cerebras/Groq chips.
(Me thinks Nvidia is purposefully picking architecture+FP4 that is easy to ship on Nvidia chips, but harder for TPU or Cerebras/Groq to deploy)