logoalt Hacker News

zmmmmmtoday at 5:02 AM4 repliesview on HN

That's a great write up.

The one thing I feel it seems to under estimate is the likelihood of improvement. Even the authors acknowledge it's not even worth comparing local models from a year ago to what we have now. In fact, people widely see Opus 4.5 in November last year - 8 months ago - as the first time agentic coding became viable broadly viable even with frontier hosted models.

So why would we lock in hard on any concept at this point of what a local model is and isn't good for? Whatever it is right now, it probably won't be that in a year. It might be naive optimism to think we'll ever get to long horizon tasks with models that run on consumer / pro grade hardware. But so far the naive optimists are winning.


Replies

sanderjdtoday at 5:14 AM

Right. Opus 4.5 8 months ago, good enough for agentic coding. How far behind that are open weight models? More than 8 months? But how much more? When will they reach Opus 4.5 level? A few months from now? A year from now? Never?

show 3 replies
3abitontoday at 7:57 AM

And a big thing that's missing is ... the harness comparison. Ot plays a very big role. I use forge, and I have been inpressed with what it can do given all the limitations of local models.

rippeltippeltoday at 5:11 AM

Since the author is referring to a specific model, I think it makes sense to ignore how the model (or local models in general) may improve over time.

It's like buying a car: I drive that car and get attuned to its characteristics; I don't think how that car (or similar cars) may improve. That's my tool and I want to make the most of it.

It is true that switching a local models it technically very cheap, but there's a considerable time investment in squeezing the most out of it, which may not work on a newer version of that model.

appplicationtoday at 5:05 AM

Agree 100%, even on claude 4.5 being the turning point for agentic coding. It completely turned me around on it.