logoalt Hacker News

hu3today at 10:02 AM2 repliesview on HN

Is there even enough market for this?

These models are dumber and slower than API SoTA models and will always be.

My time and sanity is much more expensive than insurance against any risk of sending my garbage code to companies worth hundreds of billions of dollars.

For most, it's a downgrade to use local models in multiple fronts: total cost of ownership, software maintenance, electricity bill, losing performance on the machine doing the inference, having to deal with more hallucinations/bugs/lower quality code and slower iteration speed.


Replies

hareltoday at 11:16 AM

Actually yes. For example, I run local models for ingested documents, summaries, etc. The local models are fine, and there is no need for me to pay for tokens. Performance is adequate for that purpose as well. There are many other cases where I run at scale, time is flexible so things can move slower, and I rather keep it all in house. I'm not even getting into areas where data cannot leave the premises for legal reasons. Right now I'm limited with GPUs mostly. But if that world of local models on Apple silicon is so "good", there is room to expand it to other fruits...

zozbot234today at 10:14 AM

> These models are dumber and slower than API SoTA models and will always be.

Sure but you're paying per-token costs on the SoTA models that are roughly an order of magnitude higher than third-party inference on the locally available models. So when you account for per-token cost, the math skews the other way.