No bugs means nothing if bugs get hidden and llms are great at hiding bugs and will absolutely fail ...

Grimblewald • today at 10:12 AM • 2 replies • view on HN

No bugs means nothing if bugs get hidden and llms are great at hiding bugs and will absolutely fail to find some fairly critical ones. Your own repo, which is slop at best, fails to meet its core premise

> Another AI agent. This one is awesome, though, and very secure.

it isn't secure. It took me less than three minutes to find a vulnerability. Start engaging with your own code, it isn't as good as you think it is.

edit: i had kimi "red team" it out of curiosity, it found the main critical vulnerability i did and several others

Severity - Count - Categories

Critical - 2 - SQL Injection, Path Traversal

High - 4 - SSRF, Auth Bypass, Privilege Escalation, Secret Exposure

Medium - 3 - DoS, Information Disclosure, Injection

You need to sit down and really think about what people who do know what they're doing are saying. You're going to get yourself into deep trouble with this. I'm not a security specialist, i take a recreational interest in security, and llm's are by no means expert. A human with skill and intent would, i would gamble, be able fuck your shit up in a major way.

Replies

reedf1 • today at 11:17 AM

Build a redteam into your feedback mechanism. Seriously. You've identified the problem and even solved it. Now automate it.

stavros • today at 12:26 PM

Right yeah, thanks for the constructive comment. Mind filing those vulnerabilities, or are you just making a point?

How do you know these are actual vulnerabilities? You just ran an LLM and it told you something and you came back to dunk on me, with zero context on the project.

Maybe you need to sit down and really think that you have no idea who you're talking to or what the project does. Next time you make a "omg this code is so shit" comment, include something more than "well my LLM says your LLM is bad" so we can have a discussion with facts rather than LLM-aided trashtalk.

EDIT: Out of curiosity, I've ran Kimi K2.5 on the codebase, and all the things it found are invalid, or explicit design decisions. So, next time you decide to tell someone their project "is slop" by running an LLM and relaying its verdict, consider a) the irony of what you're doing, and b) that the other person might know more than you about their own project that you spent "three minutes" running an LLM on.

alt Hacker News

Replies