>"Reasoning", however, is a feature that has been bolted on with a hacksaw and duct tape.
What do you mean by this? Especially for tasks like coding where there is a deterministic correct or incorrect signal it should be possible to train.