Claude Code can debug low-level cryptography

446 points • by Bogdanp • last Saturday at 6:41 PM • 199 comments • view on HN

Comments

Using coding agents to track down the root cause of bugs like this works really well:

> Three out of three one-shot debugging hits with no help is extremely impressive. Importantly, there is no need to trust the LLM or review its output when its job is just saving me an hour or two by telling me where the bug is, for me to reason about it and fix it.

The approach described here could also be a good way for LLM-skeptics to start exploring how these tools can help them without feeling like they're cheating, ripping off the work of everyone who's code was used to train the model or taking away the most fun part of their job (writing code).

Have the coding agents do the work of digging around hunting down those frustratingly difficult bugs - don't have it write code on your behalf.

➕ show 7 replies

XenophileJKO • last Saturday at 9:17 PM

Personally my biggest piece of advice is: AI First.

If you really want to understand what the limitations are of the current frontier models (and also really learn how to use them), ask the AI first.

By throwing things over the wall to the AI first, you learn what it can do at the same time as you learn how to structure your requests. The newer models are quite capable and in my experience can largely be treated like a co-worker for "most" problems. That being said.. you also need to understand how they fail and build an intuition for why they fail.

Every time a new model generation comes up, I also recommend throwing away your process (outside of things like lint, etc.) and see how the model does without it. I work with people that have elaborate context setups they crafted for less capable models, they largely are un-neccessary with GPT-5-Codex and Sonnet 4.5.

➕ show 1 reply

pton_xd • last Saturday at 10:08 PM

> Full disclosure: Anthropic gave me a few months of Claude Max for free. They reached out one day and told me they were giving it away to some open source maintainers.

Related, lately I've been getting tons of Anthropic Instagram ads; they must be near a quarter of all the sponsored content I see for the last month or so. Various people vibe coding random apps and whatnot using different incarnations of Claude. Or just direct adverts to "Install Claude Code." I really have no idea why I've been targeted so hard, on Instagram of all places. Their marketing team must be working overtime.

➕ show 1 reply

spacechild1 • last Saturday at 11:33 PM

> Importantly, there is no need to trust the LLM or review its output when its job is just saving me an hour or two by telling me where the bug is, for me to reason about it and fix it.

Except they regularly come up with "explanations" that are completely bogus and may actually waste an hour or two. Don't get me wrong, LLMs can be incredibly helpful for identifying bugs, but you still have to keep a critical mindset.

➕ show 1 reply

wrs • yesterday at 7:32 PM

Yesterday I tried something similar with a website (Go backend) that was doing a complex query/filter and showing the wrong results. I just described the problem to Claude Code, told it how to run psql, and gave it an authenticated cookie to curl the backend with. In about three minutes of playing around, it fixed the problem. It only needed raw psql and curl access, no specialized tooling, to write a bunch of bash commands to poke around and compare the backend results with the test database.

qsort • last Saturday at 8:17 PM

This resonates with me a lot:

> As ever, I wish we had better tooling for using LLMs which didn’t look like chat or autocomplete

I think part of the reason why I was initially more skeptical than I ought to have been is because chat is such a garbage modality. LLMs started to "click" for me with Claude Code/Codex.

A "continuously running" mode that would ping me would be interesting to try.

➕ show 2 replies

Frannky • last Saturday at 8:20 PM

CLI terminals are incredibly powerful. They are also free if you use Gemini CLI or Qwen Code. Plus, you can access any OpenAI-compatible API(2k TPS via Cerebras at 2$/M or local models). And you can use them in IDEs like Zed with ACP mode.

All the simple stuff (creating a repo, pushing, frontend edits, testing, Docker images, deployment, etc.) is automated. For the difficult parts, you can just use free Grok to one-shot small code files. It works great if you force yourself to keep the amount of code minimal and modular. Also, they are great UIs—you can create smart programs just with CLI + MCP servers + MD files. Truly amazing tech.

➕ show 2 replies

Thorrez • yesterday at 1:38 PM

>so I checked out the old version of the change with the bugs (yay Jujutsu!) and kicked off a fresh Claude Code session

There's a risk there that the AI could find the solution by looking through your history to find it, instead of discovering it directly in the checked-out code. AI has done that in the past:

https://news.ycombinator.com/item?id=45214670

delaminator • last Saturday at 8:59 PM

> For example, how nice would it be if every time tests fail, an LLM agent was kicked off with the task of figuring out why, and only notified us if it did before we fixed it?

You can use Git hooks to do that. If you have tests and one fails, spawn an instance of claude a prompt -p 'tests/test4.sh failed, look in src/ and try and work out why'

    $ claude -p 'hello, just tell me a joke about databases'

    A SQL query walks into a bar, walks up to two tables and asks, "Can I JOIN you?"

    $

Or if, you use Gogs locally, you can add a Gogs hook to do the same on pre-push

> An example hook script to verify what is about to be pushed. Called by "git push" after it has checked the remote status, but before anything has been pushed. If this script exits with a non-zero status nothing will be pushed.

I like this idea. I think I shall get Claude to work out the mechanism itself :)

It is even a suggestion on this Claude cheet sheet

https://www.howtouselinux.com/post/the-complete-claude-code-...

➕ show 1 reply

jasonjmcghee • last Saturday at 11:38 PM

I found llm debugging to work better if you give the llm access to a debugger.

You can build this pretty easily: https://github.com/jasonjmcghee/claude-debugs-for-you

gdevenyi • last Saturday at 8:20 PM

Coming soon, adversarial attacks on LLM training to ensure cryptographic mistakes.

phendrenad2 • last Saturday at 10:19 PM

A whole class of tedious problems have been eliminated by LLMs because they are able to look at code in a "fuzzy" way. But this can be a liability, too. I have a codebase that "looks kinda" like a nodejs project, so AI agents usually assume it is one, even if I rename the package.json, it will inspect the contents and immediately clock it as "node-like".

didibus • last Saturday at 10:03 PM

This is basically the ideal scenario for coding agents. Easily verifiable through running tests, pure logic, algorithmic problem. It's the case that has worked the best for me with LLMs.

cluckindan • last Saturday at 9:49 PM

So the ”fix” includes a completely new function? In a cryptography implementation?

I feel like the article is giving out very bad advice which is going to end up shooting someone in the foot.

➕ show 2 replies

zcw100 • yesterday at 12:07 AM

I just recently found a number of bugs in both the RELIC and MCL libraries. It took a while to track them down but it was remarkable that it was able to find them at all.

lordnacho • last Saturday at 9:08 PM

I'm not surprised it worked.

Before I used Claude, I would be surprised.

I think it works because Claude takes some standard coding issues and systematizes them. The list is long, but Claude doesn't run out of patience like a human being does. Or at least it has some credulity left after trying a few initial failed hypotheses. This being a cryptography problem helps a little bit, in that there are very specific keywords that might hint at a solution, but from my skim of the article, it seems like it was mostly a good old coding error, taking the high bits twice.

The standard issues are just a vague laundry list:

- Are you using the data you think you're using? (Bingo for this one)

- Could it be an overflow?

- Are the types right?

- Are you calling the function you think you're calling? Check internal, then external dependencies

- Is there some parameter you didn't consider?

And a bunch of others. When I ask Claude for a debug, it's always something that makes sense as a checklist item, but I'm often impressed by how it diligently followed the path set by the results of the investigation. It's a great donkey, really takes the drudgery out of my work, even if it sometimes takes just as long.

➕ show 1 reply

nikanj • yesterday at 11:00 AM

I'm surprised it didn't fix it by removing the code. In my experience, if you give Claude a failing test, it fixes it by hard-coding the code to return the value expected by the test or something similar.

Last week I asked it to look at why a certain device enumeration caused a sigsegv, and it quickly solved the issue by completely removing the enumeration. No functionality, no bugs!

➕ show 1 reply

deadbabe • last Saturday at 11:19 PM

With AI, we will finally be able to do the impossible: roll our own crypto.

➕ show 3 replies

rvz • last Saturday at 9:13 PM

As declared by an expert in cryptography who knows how to guide the LLM into debugging low-level cryptography, which that's good.

Quite different if you are not a cryptographer or a domain expert.

➕ show 1 reply

alt Hacker News

Claude Code can debug low-level cryptography

Comments