> And LLMs slurped some of those together with the output of thousands of people who’d do the task worse, and you have no way of forcing it to be the good one every time.
That's solvable though, whether through changing training data or RL.