logoalt Hacker News

Jcampuzano2yesterday at 4:01 PM0 repliesview on HN

Claude Code. They mention they are using claude codes CLI in the benchmark, and claude code changes constantly.

I wouldn't be surprised if the thing this is actually testing is benchmarking just claude codes constant system prompt changes.

I wouldn't really trust this to be able to benchmark opus itself.