logoalt Hacker News

artrockalteryesterday at 7:19 PM2 repliesview on HN

It's so easy to ship completely broken AI features because you can't really unit test them and unit tests have been the main standard for whether code is working for a long time now.

The most successful AI companies (OpenAI, Anthropic, Cursor) are all dogfooding their products as far as I can tell, and I don't really see any other reliable way to make sure the AI feature you ship actually works.


Replies

sk7yesterday at 7:40 PM

Tests are called "evals" (evaluations) in the AI product development world. Basically you let humans review LLM output or feed it to another LLM with instructions how to evaluate it.

https://www.lennysnewsletter.com/p/beyond-vibe-checks-a-pms-...

show 1 reply
bn-lyesterday at 7:23 PM

Microsoft: What? You want us to eat this slop? Are you crazy?!

show 1 reply