I slightly tweaked your baseline em dash example and got 100% success rate with GPT-4.1 without any ...

thegeomaster • 05/15/2025 • 2 replies • view on HN

I slightly tweaked your baseline em dash example and got 100% success rate with GPT-4.1 without any additional calls, token spend, or technobabble.

System prompt: "Remove every em-dash (—) from the following text while leaving other characters unchanged.\n\nReturn only the cleaned text."

User prompt: <prompt from tsce_chat.py filled with em dashes>

Temperature: 0.0

Replies

airylizard • 05/15/2025

Hey, thanks for kicking the tires! The run you’re describing was done in mid-April, right after GPT-4.1 went live. Since then OpenAI has refreshed the weights behind the “gpt-4.1” alias a couple of times, and one of those updates fixed the em-dash miss.

If you reran today you’d see the same improved pass rate I’m getting now. That’s the downside of benchmarking against latest model names; behaviour changes quietly unless you pin to a dated snapshot.

For bigger, noisier prompts (or on GPT-3.5-turbo, which hasn’t changed) TSCE still gives a solid uplift, so the framework’s value stands. Appreciate you checking it out!

➕ show 1 reply

alt Hacker News

Replies