Launch HN: Jazzberry (YC X25) – AI agent for finding bugs

46 points • by MarcoDewey • 05/14/2025 • 34 comments • view on HN

Hey HN! We are building Jazzberry (https://jazzberry.ai), an AI bug finder that automatically tests your code when a pull request occurs to find and flag real bugs before they are merged.

Here’s a demo video: https://www.youtube.com/watch?v=L6ZTu86qK8U#t=7

We are building Jazzberry to help you find bugs in your code base. Here’s how it works:

When a PR is made, Jazzberry clones the repo into a secure sandbox. The diff from the PR is provided to the AI agent in its context window. In order to interact with the rest of the code base, the AI agent has the ability to execute bash commands within the sandbox. The output from those commands is fed back into the agent. This means that the agent can do things like read/write files, search, install packages, run interpreters, execute code, and so on. It observes the outcomes and iteratively tests to pinpoint bugs, which are then reported back in the PR as a markdown table.

Jazzberry is focused on dynamically testing your code in a sandbox to confirm the presence of real bugs. We are not a general code review tool, our only aim is to provide concrete evidence of what's broken and how.

Here are some real examples of bugs that we have found so far.

“Authentication Bypass (Critical)” - When `AUTH_ENABLED` is `False`, the `get_user` dependency in `home/api/deps.py` always returns the first superuser, bypassing authentication and potentially leading to unauthorized access. Additionally, it defaults to superuser when the authenticated auth0 user is not present in the database.

“Insecure Header Handling (High)” - The server doesn't validate header names/values, allowing injection of malicious headers, potentially leading to security issues.

“API Key Leakage (High)” - Different error messages in browser console logs revealed whether API keys were valid, allowing attackers to brute force valid credentials by distinguishing between format errors and authorization errors.

Working on this, we've realized just how much the rise of LLM-generated code is amplifying the need for better automated testing solutions. Traditional code coverage metrics and manual code review are already becoming less effective when dealing with thousands of lines of LLM-generated code. We think this is going to get more so over time—the complexity of AI-authored systems will ultimately require even more sophisticated AI tooling for effective validation.

Our backgrounds: Mateo has a PhD in reinforcement learning and formal methods with over 20 publications and 350 citations. Marco holds an MSc in software testing, specializing in LLMs for automated test generation.

We are actively building and would love your honest feedback!

Comments

jdefr89 • 05/14/2025

Ton of work already being done on this. I am a Vulnerability Researcher @ MIT and I know of a few efforts, just at my lab alone, being worked on. So far nearly everything I have seen seems to do nothing but report false positives. They are missing bugs a fuzzer could have found in minutes. I will be impressed when it finds high severity/exploitable bugs. I think we are a bit too far from that if its achievable though. On the flip side LLMs have been very useful reverse engineering binaries. Binary Ninja w/ Sidekick (their LLM plugin) can recover and name data structures quite well. It saves a ton of time. Also does a decent job providing high level overviews of code...

➕ show 5 replies

sublinear • 05/14/2025

I'm kinda curious how this compares to GitLab's similar offering: https://docs.gitlab.com/user/project/merge_requests/duo_in_m...

➕ show 1 reply

lacker • 05/16/2025

I tried it out but I don't have any pending pull requests on my personal repositories, and I don't want to give a new tool write access to a professional repository where other people are working before trying it out a bit. It would be great if it would scan a repository and tell me if it found any bugs, so that I could see if it worked before messing with real pull requests.

ArnavAgrawal03 • 05/14/2025

I've used your product and particularly like that you show bugs in a table instead of littering my entire PR.

Does Jazzberry run on the entire codebase, or does it look at the specific PR? Would also like some more details about the tool - it seems much faster than others I've tried - are you using some kind of smaller fine-tuned model?

➕ show 1 reply

bigyabai • 05/14/2025

> Jazzberry is focused on dynamically testing your code in a sandbox to confirm the presence of real bugs.

That seems like a waste of resources to perform a job that a static linter could do in nanoseconds. Paying to spin up a new VM for every test is going to incur a cost penalty that other competitors can skip entirely.

➕ show 1 reply

rylanu • 05/15/2025

Having an agent explore a sandbox environment, install dependencies, execute tests, etc. This sounds slow and resource intensive. Does Jazzberry scale for large teams with monorepos and dozens of PRs daily?

➕ show 1 reply

rylanu • 05/19/2025

Microsoft is having so many issues it would be really helpful for tools like this to be used now.

AIorNot • 05/15/2025

Very cool - what’s the scope of its abilities:

Can this be used for UX bugs eg nav bugs or finding runtime errors in a react site? Ie will it check the frontend

➕ show 1 reply

bluelightning2k • 05/15/2025

Interesting choice to have your only demo video be testing a CLI. Unless that's literally the use-case it's for?

➕ show 1 reply

decodingchris • 05/14/2025

Cool demo! You mentioned using a microVM, which I think is Firecracker? And if it is, any issues with it?

➕ show 1 reply

RainyDayTmrw • 05/15/2025

I wish I could click into the "real bugs" and see full example output.

➕ show 1 reply

sorokod • 05/15/2025

What is your experience on running your product on it's own code?

Alex_001 • 05/15/2025

[dead]

bananapub • 05/14/2025

how did and do you validate that this is of any value at all?

how many test cases do you have? how do you score the responses? how do you ensure random changes by the people who did almost all of the work (training models) doesn't wreck your product?

➕ show 1 reply

alt Hacker News

Launch HN: Jazzberry (YC X25) – AI agent for finding bugs

Comments