This is just evaluation, not “benchmarking”. If you haven’t setup evaluation on something you’re put...

deepsquirrelnet • yesterday at 9:18 PM • 2 replies • view on HN

This is just evaluation, not “benchmarking”. If you haven’t setup evaluation on something you’re putting into production then what are you even doing.

Stop prompt engineering, put down the crayons. Statistical model outputs need to be evaluated.

Replies

andy99 • yesterday at 9:24 PM

What does that look like in your opinion, what do you use?

lorey • yesterday at 9:34 PM

This went straight to prod, even earlier than I'd opted for. What do you mean?

➕ show 1 reply

alt Hacker News

Replies