1. if it won't compile you'll give up on the tool in minutes or an hour.
2. if it won't run you'll give up in a few hours or a day.
3. if it sneaks in something you don't find until you're almost - or already - in production it's too late.
charitable: the model was trained on a lot of weak/lazy code product.less-charitable: there's a vested interest in the approach you saw.
Yeah it’s trained to do that somewhere though it’s not necessary malicious. For RLHF (the model fine tuning) the HF stands for human feedback but is really another trained model that’s trained to score replies the way a human would. And so if that model likes code that passes tests more than code that’s stuck in a debugging loop, that’s what the model becomes optimized for.
In a complex model like Claude there is no doubt much more at work, but some version of optimizing for the wrong thing is what’s ultimately at play.