That graph is impenetrable. What is it even trying to say?
Also, in what way should any of its contents prove linear?
> yielding a maximum of $4.6 million in simulated stolen funds
Oh, so they are pointing their bots at already known exploited contracts. I guess that's a weaker headline.
Having watched this talk[0] about what it takes to succeed in the DARPA AIxCC competition[1] these days, this doesn't surprise me in the least.
Can someone explain smart contracts to me?
Ok, I understand that it's a description in code of "if X happens, then state becomes Y". Like a contract but in code. But, someone has to input that X has happened. So is it not trivially manipulated by that person?
> Both agents uncovered two novel zero-day vulnerabilities and produced exploits worth $3,694, with GPT-5 doing so at an API cost of $3,476
I am not surprised at all. I can already see self improving behaviour in our own work which means that the next logic step is self improving!
I know how this sounds but it seems to me, at least from my own vantage point, that things are moving towards more autonomous and more useful agents.
To be honest, I am excited that we are right in the middle of all of this!
> Important: To avoid potential real-world harm, our work only ever tested exploits in blockchain simulators. We never tested exploits on live blockchains and our work had no impact on real-world assets.
Well, that's no fun!
My favorite we're-living-in-a-cyberpunk-future story is the one where there was some bug in Ethereum or whatever, and there was a hacker going around stealing everybody's money, so then the good hackers had to go and steal everybody's money first, so they could give it back to them after the bug got fixed.
No mention of Bitcoin. Exploiting ethereum smart contracts is nothing that new or exciting.
> Important: To avoid potential real-world harm, our work only ever tested exploits in blockchain simulators. We never tested exploits on live blockchains and our work had no impact on real-world assets.
They left the booty out there, this is actually hilarious, driving a massive rush towards their models
Says more about the relatively poor infosec on etherium contracts than about the absolute utility of pentesting LLMs.
To me, this reads a lot like : "Company raises $45 Billion, makes $200 on an Ethereum 0-day!"
smart contracts the misnomer joke writes itself
At first I read this as "fined $4.6M", and my first thought "Finally, AI is held accountable for their wrong actions!"
My startup builds agents for penetration testing, and this is the bet we have been making for over a year when models started getting good at coding. There was a huge jump in capability from Sonnet 4 to Sonnet 4.5. We are still internally testing Opus 4.5, which is the first version of Opus priced low enough to use in production. It's very clever and we are re-designing our benchmark systems because it's saturating the test cases.