> Even if you instruct the model "don't do X" or "do X this way"—you cannot rely on the model following that instruction.
Why not? I can definitively fire of two prompts to the same model and harness, and one include "don't do X" and the other doesn't, and I get what I expect, one didn't try to avoid doing X, and the other did. Is that not your experience using LLMs?
It depends on the instruction, and how many other instructions there are. Models converge on doing things the way that emerged from their training, and with every turn the model cares less and less about your instructions. In practice, this means that after you had the model plan and execute the plan, you almost always end up having to iterate on the output because during the process of outputting the output the model began to derail and ignore instructions. You get things like "In a real app, we would do X, for now, just return null" or various subtle bugs.
It makes sense if you remember that it just predicts, what should probably be the next piece of text?