yeah i really don't like the corpus of statements and it makes me doubt lenz. consider
> “Artificial intelligence will cause widespread job loss among software engineers.”
https://lenz.io/c/ai-software-engineers-job-loss-impact-05e4...
this is a statement about the future. who knows? dataset also includes
> Robots will not replace human teachers in schools in the near future.
or
> Papua New Guinea has very few female members of parliament.
what counts as very few?
> “Taurine supplementation supports mood and emotional health in humans.”
why is this labeled as misleading? i'm not even sure when I'm supposed to use the misleading label
> Anaximander was the first scientist in recorded history.
this is a judgement call as the term scientist didn't exist.
the claims that feel actually solidly answerable seem to have much better LLM performance
Agree that some of the claims are forward-looking. The messiness of the real-world and real-user fact checks. No ground-truth verdicts are provided or used in the study though. It only measures the level of agreement between the selected models, not which one is right on which claim. I.e. none of the claims is actually labelled.