Some distributional collapse is good in terms of making these things reliable tools. The creativity and divergent thinking does take a hit, but humans are better at this anyhow so I view it as a net W.
This. A default LLM is "do whatever seems to fit the circumstances". An LLM that was RLVR'd heavily? "Do whatever seems to work in those circumstances".
Very much a must for many long term tasks and complex tasks.
This. A default LLM is "do whatever seems to fit the circumstances". An LLM that was RLVR'd heavily? "Do whatever seems to work in those circumstances".
Very much a must for many long term tasks and complex tasks.