Unfortunately, that "unwanted noise" is a space for the models to compute; trying to eliminate it gives suboptimal responses. What you can do instead is try to corral it - let the model "think" like it wants, but guide it to add markers wrapping the thinking and/or result, then filter out the thinking in UI (for interactive applications) or as an intermediate/post-processing step (for hidden "building blocks").
If you're using Anthropic models, you may actually get improvements from prompting the model to maintain a tagging discipline; see https://docs.anthropic.com/en/docs/build-with-claude/prompt-....
It may be the self-aware human bias tainting this reasoning, but it seems convergent with our own psyche/brain processes, and/or inherent to the way we commonly express conscious thoughts.
Percolating tokens that allow a more "accurate" latent space appear to be more accurate, but are nearly actually useless noise. Almost a virtual shower thought.
Because people only put the answer at the end of a grammatically correct statement, with the more "reasoned" statements being more articulately percolated/logically sound, and that is expressed grammatically. These statements are inferred to be associated with intellectual boiler-plate. They may be correlated and not actually causative, but that would require a multiple component architecture with embeddings being used as a proto-"qualia" and that is getting hairy.
Facts should "only" have to be read once, and should be explicitly defined with a more secure of a confidence completely. Implicit inferences from those explicit facts should be emitted from a different, less confident module; with the chat boilerplate being tacitly composed finally when presenting the output to the user.
Of course separating the baby from the bathwater is the hard (not impossible) part.
It seems to me that it would make sense to just include more <BOS>-like meta tokens at the beginning in such cases, and have them as a prefixed scratch space that can be suppressed by treating them as non-output tokens.
it should be possible to ask model to think aloud (or step-by-step) and then give summary. in one or two prompts. give only summary back to user.
interesting
As other people pointed out here you can also add "verbosity sinks" as text fields in structured output, recently I've also been experimenting with tool calls to support guided self-talk in a way that doesn't necessarily all accumulate in the context (e.g. if not all the tool parameters get echoed back).