> On the other side, there's SCONE — Anthropic's own December 2025 benchmark. Agents...

nozzlegear • yesterday at 11:19 PM • 0 replies • view on HN

> On the other side, there's SCONE — Anthropic's own December 2025 benchmark. Agents exploited 19 of 34 post-cutoff smart contracts, 55.8% success, $4.6M in simulated funds at an average API cost of $1.22 per contract. The comparable number 12 months earlier was about 2%.

Anthropic has a vested interest in making their LLMs look advanced, powerful and dangerous. This is the company that is explicitly pro-regulation, who has donated $20M to a PAC for pro-regulation candidates, and whose own competitors accuse of being pro-regulatory capture. We should take their benchmarks and their "Mythos is too dangerous for you mere mortals" statements with a big ass grain of salt, because it plays directly into that regulation angle they're playing. Anthropic wants frontier model development locked up, with only a few select stewards of humanity holding the keys.

alt Hacker News