I see, although most users come to us for evaluating LLM applications, you're correct that the ...

jeffreyip • last Thursday at 7:03 PM • 0 replies • view on HN

I see, although most users come to us for evaluating LLM applications, you're correct that the academic benchmarking of foundational models is also offered in DeepEval, which I'm assuming what you're talking about.

We actually designed it to make it easily work off any API. How it works is you just have to create a wrapper around your API and you're good to go. We take care of the async/concurrent handling of such benchmarking so the evaluation speed is really just limited by the rate limit of your LLM API.

This link shows what a wrapper looks like: https://docs.confident-ai.com/guides/guides-using-custom-llm...

And once you have your model wrapper setup, you can use any benchmark we provide.

alt Hacker News