logoalt Hacker News

Mistletoeyesterday at 6:43 PM3 repliesview on HN

How do you measure whether it works better day to day without benchmarks?


Replies

bulbaryesterday at 6:48 PM

Manually labeling answers maybe? There exist a lot of infrastructure built around and as it's heavily used for 2 decades and it's relatively cheap.

That's still benchmarking of course, but not utilizing any of the well known / public ones.

verdvermyesterday at 6:51 PM

Internal evals, Big AI certainly has good, proprietary training and eval data, it's one reason why their models are better

show 1 reply
standardUseryesterday at 6:46 PM

Subscriptions.

show 1 reply