logoalt Hacker News

pawanjswallast Wednesday at 5:19 AM0 repliesview on HN

I have seen many LLM devs' encountered this at some point. Good to see that you are not only pointing out the inconsistency but also actively advocating a common benchmark.