It's so easy to ship completely broken AI features because you can't really unit test them...

artrockalter • yesterday at 7:19 PM • 2 replies • view on HN

It's so easy to ship completely broken AI features because you can't really unit test them and unit tests have been the main standard for whether code is working for a long time now.

The most successful AI companies (OpenAI, Anthropic, Cursor) are all dogfooding their products as far as I can tell, and I don't really see any other reliable way to make sure the AI feature you ship actually works.

Replies

sk7 • yesterday at 7:40 PM

Tests are called "evals" (evaluations) in the AI product development world. Basically you let humans review LLM output or feed it to another LLM with instructions how to evaluate it.

https://www.lennysnewsletter.com/p/beyond-vibe-checks-a-pms-...

➕ show 1 reply

bn-l • yesterday at 7:23 PM

Microsoft: What? You want us to eat this slop? Are you crazy?!

➕ show 1 reply

alt Hacker News

Replies