How does MC warm-up fit with LLMs? With LLMs you start with a prompt, so I don't see how "warm up" applies.
You're not just sampling from them like some MC cases.
> If you let the model run a bit longer it enters a region close to the typical set and when it's ready to answer you have a high probability of getting a good answer.
What does "let the model run a bit longer" even mean in this context?