logoalt Hacker News

Larrikintoday at 3:11 PM1 replyview on HN

Can this be backed up with any numbers, especially in the US? Every company I've seen using an AI something has obviously been using the API of one of the bigger companies. If this is a valid approach with proof it's basically as good, it would be something I would recommend to my company


Replies

yogthostoday at 4:14 PM

Here's a recent Stanford study showing that Chinese models are basically just as good https://hai.stanford.edu/news/inside-the-ai-index-12-takeawa...

For most use cases, you don't actually need frontier performance either. Customization, cost, and data sovereignty are far bigger practical concerns. If you can run your own model on prem and tune it exactly what you need, then you're both saving money and getting better quality output.

It's also wroth noting that tooling can go a long way to improve the quality of output from the models as well, and this is very much an under explored area right now. For example, ATLAS agentic harness does a clever trick where it gets the model to generate multiple candidates then uses a second lightweight model as a heuristic to score them keeping the promising ones. And this drastically improves coding capability.

https://github.com/itigges22/ATLAS

There's also a paper along similar lines discussing how using a harness to force a project structure also allows it to work on much larger projects successfully.

https://arxiv.org/abs/2509.16198

So, I don't think that raw power of the model is even the most important part at this point. We can squeeze a lot more juice out of smaller models we can run locally by using them more effectively.

We're basically in the mainframe era of this tech, but the pendulum always swings to tech getting more optimized and moving to edge devices over time. And I think we're already starting to see this happen with local models becoming good enough to do real work.

show 1 reply