The headline says one thing, then the article text says this: > I’m hoping it’s going to be min...

Aurornis • today at 3:00 AM • 1 reply • view on HN

The headline says one thing, then the article text says this:

> I’m hoping it’s going to be minimal.

I have multiple subscriptions and I pay per token to try out different LLM providers through OpenRouter. I also run open weight models locally.

I just can’t agree yet. The models from Anthropic and OpenAI really are that much better than anything else. The open weight models must be universally benchmaxxed across the board because my real world experience with them is very different than what the benchmarks imply. I get downvoted a lot for speaking about my experience because I don’t think it’s the reality that people want to hear right now, but it’s true for complex work.

I do think there are a lot of easier tasks that can be handled appropriately by the open weight models in the hands of a skilled operator. If an entire job is simple enough that you wouldn’t hesitate to hand it off to a junior with a little supervision then any model will do. However for a lot of the work I do, even Opus 4.8 on Max requires a lot of attention and extra steering and review to keep it on track. Fable did, too, though to a lesser degree. When I try to use the big open weight models (hosted, because they’re not running at reasonable speeds locally at a quantization I can tolerate) it feels like I spend more time waiting while they burn tokens for output that I probably have to reject anyway, at least for the bigger tasks. I wish they were there, but that’s not the case yet.

Replies

iot_devs • today at 4:40 AM

Do you have any example?

➕ show 1 reply

alt Hacker News

Replies