Why I came up with TSCE(Two-Step Contextual Enrichment).
+30pp uplift when using GPT-35-turbo on a mix of 300 tasks.
Free open framework, check the repo try it yourself
https://github.com/AutomationOptimization/tsce_demo
I tested this another 300 times with gpt-4.1 to remove those obtrusive "em-dashes" everyone hates. Tested a single-pass baseline vs TSCE, same exact instructions and prompt "Remove the em-dashes from my linkedin post. . .".
Out of the 300 tests, baseline failed to remove the em-dashes 149/300 times. TSCE failed to remove the em-dashes 18/300 times.
It works, all the data as well as the entire script used for testing is in the repo.
I slightly tweaked your baseline em dash example and got 100% success rate with GPT-4.1 without any additional calls, token spend, or technobabble.
System prompt: "Remove every em-dash (—) from the following text while leaving other characters unchanged.\n\nReturn only the cleaned text."
User prompt: <prompt from tsce_chat.py filled with em dashes>
Temperature: 0.0
That's a lot of kilo-watt-hours wasted for a find and replace operation.
Have you heard of text.replace("—", "-") ?