Where are those numbers from? It's not immediately clear to me that you can distribute one model across chips with this design.
> Model is etched onto the silicon chip. So can’t change anything about the model after the chip has been designed and manufactured.
Subtle detail here: the fastest turnaround that one could reasonably expect on that process is about six months. This might eventually be useful, but at the moment it seems like the model churn is huge and people insist you use this week's model for best results.
Well they claim two month turnaround. Big If True. How does the six months break down in your estimation? Maybe they have found a way to reduce the turnaround time.
This depends on how much better the models will get from now in, if Claude Opus 4.6 was transformed into one of these chips and ran at a hypothetical 17k tokens/second, I'm sure that would be astounding, this depends on how much better claude Opus 5 would be compared to the current generation
100x of a less good model might be better than 1 of a better model for many many applications.
This isn't ready for phones yet, but think of something like phones where people buy new ones every 3 years and even having a mediocre on-device model at that speed would be incredible for something like siri.