The test:
> The car wash is 100m away from my house. Should I walk or drive?
I didn't find a way to share the conversation, so here it is the full response in `medium` settings:
---
Walking 100 m is a very short distance—roughly a minute or two at a normal pace. In most situations, walking will be quicker overall because you avoid the time needed to get into the car, start it, and find parking. It also saves fuel and is better for the environment.
*Consider these factors before deciding:*
| Factor | Walking | Driving | |--------|---------|---------| | *Time* | ~1–2 min (plus a few seconds to step out) | ~1 min driving, plus ~1–2 min to get in/out and park | | *Energy/Fuel* | None | Uses a small amount of fuel | | *Weather* | Comfortable if dry & mild | Better if it’s raining, very hot, or icy | | *Physical ability* | Easy for most people | Not needed for such a short trip | | *Car availability* | Not required | Must have a car nearby and be willing to use it | | *Environmental impact* | Zero emissions | Small emissions |
*Recommendation:* If the weather is pleasant and you don’t need the car for anything else right away, walking is the simplest, fastest, and most eco‑friendly choice. Drive only if you’re dealing with inclement weather, have heavy items to carry, or need the car immediately after the wash.
Do you have any specific constraints (e.g., rain, heavy bags, time pressure) that might affect the decision?
If a stranger asks me, "Should I walk or drive to this car wash?" then I assume they're asking in good faith and both options are reasonable for their situation. So it's a safe assumption that they're not going there to get their car washed. Maybe they're starting work there tomorrow, for example, and don't know how pedestrian-friendly the route is.
Is the goal behind evaluating models this way to incentivize training them to assume we're bad-faith tricksters even when asking benign questions like how best to traverse a particular 100m? I can't imagine why it would be desirable to optimize for that outcome.
(I'm not saying that's your goal personally - I mean the goal behind the test itself, which I'd heard of before this thread. Seems like a bad test.)