Frankly the most critical question is if they can really take shortcuts on DV etc, which are the main reasons nobody else tapes out new chips for every model. Note that their current architecture only allows some LORA-Adapter based fine-tuning, even a model with an updated cutoff date would require new masks etc. Which is kind of insane, but props to them if they can make it work.
From some announcements 2 years ago, it seems like they missed their initial schedule by a year, if that's indicative of anything.
For their hardware to make sense a couple of things would need to be true: 1. A model is good enough for a given usecase that there is no need to update/change it for 3-5 years. Note they need to redo their HW-Pipeline if even the weights change. 2. This application is also highly latency-sensitive and benefits from power efficiency. 3. That application is large enough in scale to warrant doing all this instead of running on last-gen hardware.
Maybe some edge-computing and non-civilian use-cases might fit that, but given the lifespan of models, I wonder if most companies wouldn't consider something like this too high-risk.
But maybe some non-text applications, like TTS, audio/video gen, might actually be a good fit.