IME it's Claude that pushes back, and Codex that just does the thing. It's happened...

pyridines • today at 4:50 AM • 1 reply • view on HN

IME it's Claude that pushes back, and Codex that just does the thing. It's happened once or twice where I've told Claude bluntly and directly "do this" and it responded "no, here's why that's a bad idea..." Maybe it's just my CLAUDE.md.

Not sure if there are sycophancy benchmarks for coding agents

Replies

mcintyre1994 • today at 8:00 AM

I find the same. Someone posted this benchmark here: https://petergpt.github.io/bullshit-benchmark/viewer/index.v...

It measures whether models push back on bullshit prompts or just go along with it, and Claude models are all the top performers.

alt Hacker News

Replies