logoalt Hacker News

ACCount37yesterday at 5:07 PM0 repliesview on HN

Yeah, that's my point. Humans are not reliable LLM evaluators. "Secret model nerfs" happen in "vibes" far more often than they do in any reality.