logoalt Hacker News

raw_anon_1111yesterday at 5:26 PM1 replyview on HN

For the most part, I don’t do chatbots except for a couple of RAG based chatbots. It’s more behind the scenes stuff like image understanding, categorization, nuanced sentiment analsys, semantic alignment, etc.

I’ve created a framework that lets me test the quality in automated way between prompt changes and models and I compare costs/speed/quality.

The only thing that requires humans to judge the qualify out of all those are RAG results.


Replies

biophysboyyesterday at 5:29 PM

So who is the winner using the framework you created?

show 1 reply