logoalt Hacker News

nialseyesterday at 6:10 PM2 repliesview on HN

Nothing points out that the benchmark is invalid like a zero false positive rate. Seemingly it is pre-2020 text vs a few models rework of texts. I can see this model fall apart in many real world scenarios. Yes, LLMs use strange language if left to their own devices and this can surely be detected. 0% false positive rate under all circumstances? Implausible.


Replies

maxsperoyesterday at 6:21 PM

Our benchmarks of public datasets put our FPR roughly around 1 in 10,000. https://www.pangram.com/blog/all-about-false-positives-in-ai...

Find me a clean public dataset with no AI involvement and I will be happy to report Pangram's false positive rate on it.

show 2 replies
pinkmuffinereyesterday at 7:26 PM

> Nothing points out that the benchmark is invalid like a zero false positive rate

You’re punishing them for claiming to do a good job. If they truly are doing a bad job, surely there is a better criticism you could provide.

show 1 reply