The problem is that the model doesn't know what you mean by "bad code" a priori. If you list specific issues you care about (e.g. separation of concerns, don't repeat yourself, single responsibility, prefer pure functions, etc) it's pretty good at picking them out. Humans have this problem as well, we're just more opinionated.
Yes, that's exactly what I mentioned earlier, if you describe the implementation, you can get something you can work with long-term. But if you just describe an idea, and let the LLM do both the design of the implementation and the implementation itself, eventually it seems to fall over itself and changes takes longer and longer time.