I find this also heavily depends on which LLM you're using. I've found chatGPT is completely awful at getting corrected, it'll double down until the cows come home. Meanwhile claude will generally adjust its behavior without too much nagging.