What’s the state of the art of reverse engineering source code from binaries in the age of agentic coding? Seems like something agents should be pretty good at, but haven’t read anything about it.
I've been working on this, the results are pretty great when using the fancier models. I have successfully had gpt5.2 complete fairly complex matching decompilation projects, but also projects with more flexible requirements.
Nothing yet, agents analyze code which is textual.
The way they analyze binaries now is by using textual interfaces of command tools, and the tools used are mostly the ones supported by Foundation Models at training time, mostly you can't teach it new tools at inference, they must be supported at training. So most providers are focused on the same tools and benchmarking against them, and binary analysis is not in the zeitgeist right now, it's about production more than understanding.
Agents are sort of irrelevant to this discussion, no?
Like, it's assuredly harder for an agent than having access to the code, if only because there's a theoratical opportunity to misunderstand the decompile.
Alternatively, it's assuredly easier for an agent because given execution time approaches infinity, they can try all possible interpretations.
I think there’s a good possibility that the technology that is LLMs could be usefully trained to decode binaries as a sort of squint-and-you-can-see-it translation problem, but I can’t imagine, eg, pre-trained GPT being particularly good at it.