logoalt Hacker News

an0malousyesterday at 10:41 PM4 repliesview on HN

This is why the AI companies are rushing to IPO. By the end of next year you’ll be running most of your AI on device. They have no moat, they’ve reached the limits of scaling, most of the magic can be distilled into smaller models, and they know it


Replies

hadlockyesterday at 11:15 PM

Qwen's ~30B-class models are genuinely good enough for use if you can find a machine with enough memory bandwidth to run them at 30-90 tokens/second. It's been extremely telling that Qwen stopped releasing 120b class models. At some point in the next 10 years (maybe 3?) someone is going to release an Opus 4.5 class 256B model you can run locally. Right now our engineers use about $800/mo worth of opus tokens; at that rate the ROI for local LLM is ~10 months

show 1 reply
cat5eyesterday at 11:15 PM

Huzzah, they’ve lost their stranglehold. Viva la revolution!

sealeckyesterday at 11:00 PM

Have we reached the limits of scaling? Sadly it appears that larger model still equals better model

show 4 replies
ActorNightlyyesterday at 11:37 PM

Very false.

I use small models exclusively. They aren't a replacement for large models. You need decent hardware to run those models efficiently, as smaller parameter models plain suck and are still slow on macbooks. And affordability of higher end hardware is very limited.

Even at non VC subsidized $/token prices, its still much cheaper to run cloud based models.

show 2 replies