We need more rigorous benchmarks for SRE tasks, which is much easier said that done. The only othe...

smithclay • yesterday at 5:10 PM • 1 reply • view on HN

We need more rigorous benchmarks for SRE tasks, which is much easier said that done.

The only other benchmark I've come across is https://sreben.ch/ ... certainly there must be others by now?

nyellin • yesterday at 8:43 PM

We publish the benchmarks for HolmesGPT (CNCF sandbox project) at https://holmesgpt.dev/development/evaluations/

alt Hacker News