These coding agent models only started getting useful in January. Before that they were difficult to control autocomplete, and not very smart.
January was an inflection point, and no open weights model has crossed over that same threshold.
This is definitely recursive self improvement territory, except that we're prohibited from participating.
It feels like the capability gap is wider than before.
It was more like November. But it wasn’t really an inflection point, harnesses got good enough that people started noticing by the holiday break. And I’m not discounting some good ol’ stealth marketing in there as well.
Deepseek feels pretty close to Opus at this point, and it’s certainly useful enough for me to spend $20 on api tokens instead of four Claude max plans….