logoalt Hacker News

cobertoslast Monday at 1:54 AM0 repliesview on HN

I still don't quite get this reasoning. A statistical model for detecting a category (like is this written hiring material LLM generated or not, is this email spam or not, etc) is most metricized by its false positive and false negative rate. But it doesn't sound like anyone measures this, it just gets applied after a couple times of "huh, that worked" and we move on. There's a big difference between a model that performs successfully 70% of the time vs one that performs 99% but I'm not sure we can say which this is?

Maybe if LLMs were aligned for this specific task it'd make more sense? But they're not. Their alignment tunes them to provide statistically helpful responses for a wide variety of things. They prefer positive responses to negative ones and are not tuned directly as a detection tool for arbitrary categorization. And maybe they do work well, but maybe it's only a specific version of a specific model against other specific models hiring material outputs? There's too many confounding things here to not have to study this in a rigorous way to come to the conclusion that felt... not carefully considered.

Maybe you have considered this more than I know. It sounds like you work a lot with this data. But the off-handedness set off my skepticism.