So they're paying expensive input tokens to extract at best a tiny amount of information ("...

zozbot234 • yesterday at 7:45 PM • 1 reply • view on HN

So they're paying expensive input tokens to extract at best a tiny amount of information ("judgment") per request? That's even less like "distillation" than the other claim of them trying to figure out reasoning by asking the model to think step by step.

Replies

red2awn • yesterday at 10:40 PM

LLM-as-a-judge is quite effective method to RL a model, similar to RLHF but more objective and scalable. But yes, anthropic is making it more serious than it is. Plus DeepSeek only did it for 125k requests, significantly less than the other labs, but Anthropic still listed them first to create FUD.

alt Hacker News

Replies