Too many things are left unsaid => too many assumptions. As usual, even with human beings specifications are key, and context (what each entity knows about the other one or the situation) is an implicit part of them.
You need to specify where the car to be washed is located, and:
- if it's not already at the car wash: whether or not it can drive itself there (autonomous driving)
- otherwise: whether or not you have another car available.
Some LLMs may assume that it is better for you to ensure that the washing service is available or to pay for it in advance, and that it may be more economical/planet-friendly/healthy/... to walk, then check/pay, then if OK to drive back.
Nothing so deep as that needed here to understand what is going on; it's a paid vs free issue - free versions are less competent while paid versions of the reasoning/thinking models are getting it right. Different providers may hobble their free versions less, so those ones also get it right.
The guardrails you have outlined will help squeeze out more performance from smaller/less capable models, but you shouldn't have to jump through these hoops as a general user when clearly better models exist.