> within a few years we will be running local models as good as today’s frontier models with almost no cost burden
Based on what? The RAM requirements alone are extraordinary.
No, running large models on shared, dedicated hosted hardware at full utilization is going to be vastly more cost-efficient for the foreseeable future.
You can now buy 128 GB unified memory computers from AMD as commodity.
They’re still pricey, the world is still scaling up memory production, and a lot of code isn’t yet built for AMD, but we went from the Wright’s brothers first airplane to jet engines in 27 years.
I’m not sure “it’s only a few years away” but we are sure moving there fast.
I strongly disagree. Humans are so insanely well incentivized here with trillions in market share to make localized AI good enough and that’s the only benchmark they need.
>running large models on shared, dedicated hosted hardware at full utilization is going to be vastly more cost-efficient for the foreseeable future.
That is only true right now because hundreds of billions of dollars are being burned by these AI companies to try to win market share. If you paid what it actually cost, your comment would likely be very different.
Local modals are 6 months to 18 months behind frontier. Even if the performance of a cloud model is faster, it's clear that local is catching up.