logoalt Hacker News

h1frayesterday at 10:52 AM2 repliesview on HN

evals are glorified integration tests, would you invest in an integration test startup? absolutely not. I don't get why we are making all of this fuzz around evals


Replies

hilariouslyyesterday at 11:11 AM

Because what people actually want is a simple harness to test their use cases against all the frontier models and see which is the cheapest/best for the job.

It's simple to say but hard to master doing well, and the important thing is that no matter what tool you have the evals don't write themselves.

pydryyesterday at 12:30 PM

There are a number of integration test startups. None of them do a great job but they do exist.