OpenAI also announced two days ago that they're starting to make Cerebras style chips themselves [0], will be interesting to see how fast SotA model inference will be by the end of the year.
[0]: https://openai.com/index/openai-broadcom-jalapeno-inference-...
Cerebras is different than what jalapeno is.
Jalepeno is for mass scale inference.
Cerebras is extremely expensive and difficult to scale, hence the limited release.
Even if their chip is a difference maker, end of the year is wayy too optimistic. It’ll at minimum be a multi-year effort to bring it to production at scale.
I don't see any indications that OpenAI is doing wafer-scale work.
I tend to doubt they would. Cerebras notably doesn't have a kv, is wildly high bandwidth, but within/across the chip, not able to dump/restore kv super well. I doubt openai is going to build something that is as expensive to run. Also, wafer-scale is absurdly hard & weird to pull off, so I doubt that would be their first foray.
I don't understand how you refer to this as "Cerebras-style". Cerebras is wafer-scale and unique. Jalapeno is an inference-optimized conventional chip.