It’s like anything else, you’ve got to check the results and potentially push it to fix stuff. I r...

dpark • yesterday at 9:43 PM • 2 replies • view on HN

It’s like anything else, you’ve got to check the results and potentially push it to fix stuff.

I recently had AI code up a feature that was essentially text manipulation. There were existing tests to show it how to write effective tests and it did a great job of covering the new functionality. My feedback to the AI was mostly around some inaccurate comments it made in the code but the coverage was solid. Would have actually been faster for me to fix but I’m experimenting with how much I can make the AI do.

On the other hand I had AI code up another feature in a different code base and it produced a bunch of tests with little actual validation. It basically invoked the new functionality with a good spectrum of arguments but then just validated that the code didn’t throw. And in one case it tested something that diverged slightly from how the code would actually be invoked. In that case I told it how to validate what the functionality was actually doing and how to make the one test more representative. In the end it was good coverage with a small amount of work.

For people who don’t usually test or care bunch about testing, yeah, they probably let the AI create garbage tests.

Replies

ubercow13 • yesterday at 10:58 PM

>feature that was essentially text manipulation

That seems like the kind of feature where the LLM would already have the domain knowledge needed to write reasonable tests, though. Similar to how it can vibe code a surprisingly complicated website or video game without much help, but probably not create a single component of a complex distributed system that will fit into an existing architecture, with exactly the correct behaviour based on some obscure domain knowledge that pretty much exists only in your company.

➕ show 1 reply

fzeroracer • yesterday at 11:39 PM

I don't see anything here that corroborates your claim that it outputs more consistent test code than most engineers. In fact your second case would indicate otherwise.

And this also goes back to my first point about writing tests that matters. Coverage can matter, but coverage is not codifying business logic in your test suite. I've seen many engineers focus only on coverage only for their code to blow up in production because they didn't bother to test the actual real world scenarios it would be used in, which requires deep understanding of the full system.

➕ show 1 reply

alt Hacker News

Replies