How do LLMs do on things that are common confusions? Do they specifically have to be trained against them? I'm imagining a Monty Hall problem that isn't in the training set tripping them up the same way a full wine glass does