logoalt Hacker News

mdeeksyesterday at 8:22 PM7 repliesview on HN

You can get a taste of this today yourself with Codex Security. I turned it on just as an experiment and in less than a week it has now become essential to all of us. I was shocked how accurate it is, how many security issues it found in existing code, how it continually finds them as we commit, and how NO ONE is immune from making these mistakes.

I'd say it is about 90% accurate for us. Often even the "Low" findings lead us to dig and realize it is actually exploitable. Everyone makes these mistakes, from the most junior to the most senior. They are just a class of bugs after all.

I expect tools like this to be a regular part of the development lifecycle from here on. We code with AI, we review with AI, we search for vulns with AI. Even if it isn't perfect, it is easily worth the cost IMHO. Highly recommend you get something enabled for your own repos ASAP


Replies

winstonwinstonyesterday at 8:44 PM

> I expect tools like this to be a regular part of the development lifecycle from here on. We code with AI, we review with AI, we search for vulns with AI. Even if it isn't perfect, it is easily worth the cost IMHO.

So, how is that supposed to work? Claude Code generates security bugs, then Claude Security finds them, then Claude Code generate fix, spend tokens, profit?

show 9 replies
Version467yesterday at 8:35 PM

I’ve had the same experience. The ui is a little unclear about this, because it says you have 5 scans, but 1 scan is just the continuous monitoring of the default branch of a repo.

The high impact findings have almost all been bang on for me. I was especially surprised by the high-quality documentation it produces as well as how narrow the proposed fixes are.

I’m used to codex producing quite a but more code than it needs to, but the security model proposed fixes that are frequently <10 loc, targeting exactly the correct place.

It’s really quite good. I’m assuming it’ll be pretty expensive once out of beta, but as a business I’d be jumping on this.

mnahkiesyesterday at 9:25 PM

One issue I've seen with LLM's is adding superfluous code in the name of "safety" and confidently generating a bunch of stuff that was useful in years gone by, but now handled correctly by the standard lib. I'm of the opinion that less is more when it comes to code, and find the trend this is introducing quite frustrating.

How do you avoid this pitfall?

show 4 replies
0xAstroyesterday at 8:34 PM

I would recommend you to try out the setup with gpt-5.5-cyber as the orchestrator and deepseek-v4-flash or some other fast cheap model as its workers. Getting pretty good results using this setup.

lateral_cloudyesterday at 11:16 PM

Did you need to do anything special to get access to Codex Security?

show 1 reply
rmastyesterday at 8:49 PM

I help maintain a project that is used as a dependency by a lot of security tools to handle PE files.

It’s disappointing that Anthropic and OpenAI never responded to the applications to their respective programs for open source maintainers. From my perspective it seems like their offers are primarily for the shiny well-known projects, rather than ones that get only a few million monthly installs but aren’t able to get thousands of stars due to being “hidden” as a dependency of popular tool.

hollowturtleyesterday at 10:20 PM

> I was shocked how accurate it is, how many security issues it found in existing code, how it continually finds them as we commit, and how NO ONE is immune from making these mistakes.

Dude is flexing that he's pushing unsecure code every day, that's a skill!