Model cards are just marketing material. I wouldn’t trust them one bit.

Hendrikto • today at 2:21 PM • 4 replies • view on HN

Replies

You don't need to trust anyone. GPT 5.4 xhigh is available and you can test it for $20, to verify it is actually able to find complex bugs in old codebases. Do the work instead of denying AI can do certain things. It's a matter of an afternoon. Or, trust the people that did this work. See my YouTube video where I find tons of Redis bugs with GPT 5.4.

➕ show 2 replies

mbesto • today at 4:45 PM

And overfitting benchmarks can easily be gamed. Yet here we are with the top HN comment on the HN Mythos thread outlining it's benchmarking performance gains.

I guess we'll never learn.

Yokohiii • today at 2:36 PM

The whole discussion started out as an attempt to disprove/verify anthropics (model card) claims.

He also transfers the logic of their claims to the actual real world. You can say that model cards are marketing garbage. You have to prove that experienced programmers are not significantly better at security.

➕ show 1 reply

2983592 • today at 2:24 PM

But they are treated as holy scripture ...

alt Hacker News

Replies