logoalt Hacker News

All the bugs they found

73 pointsby ziggy42last Tuesday at 10:33 AM27 commentsview on HN

Comments

r0ze-at-hntoday at 3:15 PM

> It is also very extensively tested against the official WASM testsuite.

I was hoping that near the end the author would have tried to contributed any new tests to the official testsuite to help catch these same errors elsewhere.

For the first one, Zero Is Not Null maybe there is a missing test in in the call_indirect test?

https://github.com/WebAssembly/testsuite/blob/main/call_indi...

NitpickLawyertoday at 10:14 AM

The cool thing about LLMs is that once a capability is "good enough" you can always "chain" them together for better overall results. On the client side this means "write an API that does x y z" -> "analyse this API for security concerns" -> "PoC for each finding from this report" -> "fix this code according to these verified claims".

On the "server side" (i.e. training) you can use the current gen models to improve the training data by running many parallel environments with a similar loop as above. Then incorporate the new data and repeat. Reminiscent of the old GAN approach, where the generator and discriminator are trained together in an adversarial regime. The end result should be safer code on "vanilla" prompts. "Write an API that does x y z" should now contain the learnings from this loop, and the models should produce better code.

Works really well for every verifiable scenario. And as the models become better, they can also more reliably create environments that closely match real-world scenarios. If you also have some data from human devs (say you run a subsidised coding model for a few months), even better.

An example of turning a "normal" repo into a verifiable environment that I read recently in the Cursor blog: take a repo, ask an LLM to remove a feature, verify that the app still works w/o the feature, verify that the tests for that feature fail. Ask a generator to "add feature x". Verify with the original tests. If pass -> give carrot :)

The key is composition. Once you unlock a new capability, that gets implemented and incorporated into the next training run. Pretty neat, I would say, and the main driver for the recent increase in the breadth of capabilities for new models.

show 2 replies
AznHisokatoday at 11:44 AM

… running thru my head. All the bugs they found…

show 2 replies
jcarranotoday at 10:46 AM

I found, in my rather recent experience with Go, that using anything other than zero for invalid, default or "sentinel" values is a source of potential problems due to the lack of real constructors.

show 1 reply
andaitoday at 2:26 PM

I thought this was going to be a list of all the zero days that popped up on HN in the last ten days.

xeyownttoday at 10:13 AM

Nice writeup. A practical example of a project, what was found, how it was found, the quality of the findings, reproducible.

rashartoday at 11:04 AM

He describes himself as "Software engineer. Writing code prompts at Google".

So throwing his own, apparently poorly written, creation under the bus will get him applause and promotions by the AI lunatics.

It is a currently popular strategy among AI boosters.

show 4 replies
vachanmn123today at 8:48 AM

> Trying to work around Anthropic blocking security-related prompts does get pretty tiring though.

Didn't know this is a thing... interesting for a company that's marketing their Mythos so hard not allowing security prompts.

I am also curious how the cheaper Chinese models do, I have an Opencode Go plan, so I'll let 'em rip over the weekend, hopefully I get to see a few bugs!

show 1 reply
jedisct1today at 2:48 PM

Have you tried Swival /audit? https://swival.dev/pages/audit.html

shandilyaharshtoday at 9:14 AM

sometimes i feel mythos is just that a myth

show 3 replies
Makiavelitoday at 10:34 AM

[dead]

MarStudiotoday at 9:33 AM

[dead]

grey-areatoday at 7:55 AM

[flagged]

keyboredtoday at 10:37 AM

I don’t really care about posting in bold 20 bugs when it comes to a hobby project. (In before “Linux was just a hobby project”) No need to LLM post over what this tells us about the trajectory of society, oh my.

We can save that dialogue for finding bugs in widely used projects.

Edit: Something I tried to reply to a now-dead top level comment here: Whoever claims that new accounts alone is a signal for submission-boosting comments etc. needs to update their heuristics.