Nothing points out that the benchmark is invalid like a zero false positive rate. Seemingly it is pr...

nialse • yesterday at 6:10 PM • 2 replies • view on HN

Nothing points out that the benchmark is invalid like a zero false positive rate. Seemingly it is pre-2020 text vs a few models rework of texts. I can see this model fall apart in many real world scenarios. Yes, LLMs use strange language if left to their own devices and this can surely be detected. 0% false positive rate under all circumstances? Implausible.

Replies

maxspero • yesterday at 6:21 PM

Our benchmarks of public datasets put our FPR roughly around 1 in 10,000. https://www.pangram.com/blog/all-about-false-positives-in-ai...

Find me a clean public dataset with no AI involvement and I will be happy to report Pangram's false positive rate on it.

➕ show 2 replies

pinkmuffinere • yesterday at 7:26 PM

> Nothing points out that the benchmark is invalid like a zero false positive rate

You’re punishing them for claiming to do a good job. If they truly are doing a bad job, surely there is a better criticism you could provide.

➕ show 1 reply

alt Hacker News

Replies