logoalt Hacker News

renewiltordtoday at 2:12 AM0 repliesview on HN

Recent performance of Claude Opus 4.7 and Claude Code has been poor because of context bloat. Model no longer obeys instructions well. Codex on medium reasoning and fast mode is often better. I have simple local manual eval through harness and automated eval for other programs and Opus still best on latter but garbage experience on former.

Spent last evening so frustrated I also got ChatGPT subscription. Makes me wonder if I should be using Gemini on pay per use with custom harness.

With my own harness performance is way better but cost goes up because no subscription.