Have you tried it with something like OpenSpec? Strangely, taking the time to lay out the steps in a large task helps immensely. It's the difference between the behavior you describe and just letting it run productively for segments of ten or fifteen minutes.
> Have you tried it with something like OpenSpec?
No. The parent comment said I needed a new model, which I've tried. Being told "just try something else aswell" kind of proves the point.