I just tried a few things that are simple and a world model would probably get right. Eg
Question to GPT5: I am looking straight on to some objects. Looking parallel to the ground.
In front of me I have a milk bottle, to the right of that is a Coca-Cola bottle. To the right of that is a glass of water. And to the right of that there’s a cherry. Behind the cherry there’s a cactus and to the left of that there’s a peanut. Everything is spaced evenly. Can I see the peanut?
Answer (after choosing thinking mode)
No. The cactus is directly behind the cherry (front row order: milk, Coke, water, cherry). “To the left of that” puts the peanut behind the glass of water. Since you’re looking straight on, the glass sits in front and occludes the peanut.
It doesn’t consider transparency until you mention it, then apologises and says it didn’t think of transparency
this seems like a strange riddle. In my mind I was thinking that regardless of the glass, all of the objects can be seen (due to perspective, and also the fact you mentioned the locations, meaning you're aware of them).
It seems to me it would only actually work in an orthographic perspective, which is not how our reality works
Gemini 2.5 Pro gets this correct on the first attempt, and specifically points out the transparency of the glass of water.
https://g.co/gemini/share/362506056ddb
Time to get the ol' goalpost-moving gloves out.
Worked for me: https://chatgpt.com/share/689bc3ef-fa1c-800f-9275-93c2dbc11b...