I think we have barely scratched the surface of post-trained inference/generative model inference efficiency.
A uniquely efficient hardware stack, for either training or inference, would be a great moat in an industry that seems to offer few moats.
I keep waiting to here of more adoption of Cerebras Systems' wafer-scale chips. They may be held back by not offering the full hardware stack, i.e. their own data centers optimized around wafer-scale compute units. (They do partner with AWS, as a third party provider, in competition with AWS own silicon.)
Re: cerebras, they filed a S1 [1] last year when attempting to go public. It showed something like a $60M+ loss for the first 6 months of 2024. The IPO didn’t happen because the CEO’s past included some financial missteps and the banks didn’t want to deal with this. At the time the majority of their revenue came from a single source in Abu Dhabi, as well. They did end up benefiting by the slew of open source model releases which enabled them to become inference providers via APIs rather than needing to provide the full stack for training.
[1] https://www.sec.gov/Archives/edgar/data/2021728/000162828024...
Google is already there with TPUs. The reason they can add AI to every single google search is not just that Google has near-infinite cash, but also that inference costs far less for Google than anyone else.
> would be a great moat
I hope we never find good moats. I hope that progress in AI is never bottlenecked on technology that centralizes control over the ecosystem to one or a handful of vendors. I want to be able to run the models myself and train them myself. I don't want to be beholden to one company because they managed to hire up all the people building fancy optical chips and kept the research for themselves.