It's not about seeing. It's about identifying the legs of the Pelican and then transferring the concept and mechanics of riding a bicycle + geometry of a body and a bicycle. The entire task has also nothing to do with vision tokens.