logoalt Hacker News

tinfoilhatterlast Tuesday at 9:14 PM1 replyview on HN

What does "vibe" testing code entail exactly? Apparently you don't look at code when you're "vibe" testing it based on this statement:

> When to look at the code or when to just "vibe" test it and move on.

I'm really curious how you're ensuring the code output by whatever LLM you're using, is actually doing what you think it's doing.


Replies

NitpickLawyerlast Tuesday at 9:44 PM

I stick by the og definition, in that when vibe coding I don't look at the code. I don't care about the code. When I said "vibe test it" I meant test the result of the vibe coding session.

Here's a recent example where I used this pattern: I was working on a (micro) service that implements a chat based assistant. I designed it a bit differently than the traditional "chat bot" that's prevalent right now. I used a "chat room" approach, where everyone (user, search, LLM, etc) writes in a queue, and different processes trigger on different message types. After I finished, I had tested it with both unit tests and scripted integration tests, with some "happy path" scenarios.

But I also wanted to see it work "live" in a browser. So, instead of waiting for the frontend team to implement it, I started a new session, and used a prompt alongt he lines of "Based on this repo, create a one page frontend that uses all the relevant endpoints and interfaces". The "agent" read through all the relevant files, and produced (0 shot) an interface where everything was wired correctly, and I could test it, and watch the logs in real-time on my machine. I never looked at the code, because the artifact was not important for me, the important thing was the fact that I had it, 5 minutes later.

Fun fact, it did allow me to find a timing bug. I had implemented message merging, so the LLM gets several messages at once, when a user types\n like\n this\n and basically adds new messages while the others are processing. But I had a weird timing bug, where a message would be marked as "processing", a user would type a message, and the compacting algo would all act "at the same time", and some messages would be "lost" (unprocessed by the correct entity). I didn't see that from the integration tests, because sometimes just playing around with it reveals such weird interactions. For me being able to play around with the service in ~5 minutes was worth it, and I couldn't care less about the artifact of the frontend. A dedicated team will handle that, eventually.

show 1 reply