logoalt Hacker News

Hendriktotoday at 2:21 PM4 repliesview on HN

Model cards are just marketing material. I wouldn’t trust them one bit.


Replies

antireztoday at 4:51 PM

You don't need to trust anyone. GPT 5.4 xhigh is available and you can test it for $20, to verify it is actually able to find complex bugs in old codebases. Do the work instead of denying AI can do certain things. It's a matter of an afternoon. Or, trust the people that did this work. See my YouTube video where I find tons of Redis bugs with GPT 5.4.

show 2 replies
mbestotoday at 4:45 PM

And overfitting benchmarks can easily be gamed. Yet here we are with the top HN comment on the HN Mythos thread outlining it's benchmarking performance gains.

I guess we'll never learn.

Yokohiiitoday at 2:36 PM

The whole discussion started out as an attempt to disprove/verify anthropics (model card) claims.

He also transfers the logic of their claims to the actual real world. You can say that model cards are marketing garbage. You have to prove that experienced programmers are not significantly better at security.

show 1 reply
2983592today at 2:24 PM

But they are treated as holy scripture ...