> There are certain tasks, like improving a given program for speed, for instance, where in theory the model can continue to make progress with a very clear reward signal for a very long time.
This makes me think: I wonder if Goodhart's law[1] may apply here. I wonder if, for instance, optimizing for speed may produce code that is faster but harder to understand and extend. Should we care or would it be ok for AI to produce code that passes all tests and is faster? Would the AI become good at creating explanations for humans as a side effect?
And if Goodhard's law doesn't apply, why is it? Is it because we're only doing RLVR fine-tuning on the last layers of the network so all the generality of the pre-training is not lost? And if this is the case, could this be a limitation in not being able to be creative enough to come up with move 37?
> I wonder if, for instance, optimizing for speed may produce code that is faster but harder to understand and extend.
Superoptimizers have been around since 1987: https://en.wikipedia.org/wiki/Superoptimization
They generate fast code that is not meant to be understood or extended.
Ehh I think if it ends up being a half good architecture you wind up with a difficult to understand kernel that never needs touching.
I wonder if, for instance, optimizing for speed may produce code that is faster but harder to understand and extend.
This is generally true for code optimised by humans, at least for the sort of mechanical low level optimisations that LLMs are likely to be good at, as opposed to more conceptual optimisations like using better algorithms. So I suspect the same will be true for LLM-optimised code too.