I don't know about the GP, but my workflow is similar to theirs, but I aim to ship low thousands of lines per week. The fewer the better. I even tell the agent to only write high SNR tests, otherwise it just adds useless "make sure this function returns this thing we hardcoded".
I usually succeed, BTW. I spend a lot of time planning, but usually each PR is a few hundred lines, and fairly easily reviewable.
I mostly work with Python backends, though these days it might be any language (Ruby, Go, TS).