logoalt Hacker News

redwoodyesterday at 2:32 PM1 replyview on HN

I found this pretty hard to read as the author has a very specific understanding of what an eval startup means but it is only implied rather than explicitly described. I would have thought that they were referring to the companies that provide a technology platform to enable you to do evals in an AI application context for example companies like Comet/Opik and Braintrust.

But it sounds like the author does not mean those companies at all since those are actually important in enabling the very Venn diagram he/she describes.

Based on what I assume the author's referring to they are referring to something more like a public benchmark report provider... I would say but yes that's a relatively small total addressable Market space no matter how you look at it


Replies

intendedyesterday at 4:07 PM

Funnily enough, this made immediate sense to me, and I think it derives from being a situation where you need high reliability from a process, eg: I need a bot which has a 99.99% guarantee to not go out of bounds or say something incorrect.