There was also that one paper that had very noticeable benchmark improvements in non-thinking models by just writing the prompt twice. The same paper remarked how thinking models often repeat the relevant parts of the prompt, achieving the same effect.
Claude is already pretty light on flourishes in its answers, at least compared to most other SotA models. And for everything else it's not at all obvious to me which parts are useless. And benchmarking it is hard (as evidenced by this thread). I'd rather spend my time on something else