They didn't need to come about at the same time. Photosensitive proteins (opsins) and cellular motility both predate multicellular life entirely. Even single-celled euglena detect light and swim toward it with no nervous system at all. In early multicellular animals, cells were already chemically signaling their neighbors. A photosensitive cell releasing a signaling molecule near a contractile cell isn't a coordinated miracle. It is just two pre-existing cell types sitting next to each other in tissue, which is what bodies are. Natural selection then refines that crude coupling because even a tiny, noisy light response is better than none.
Each piece, light-sensitive proteins, cell-to-cell signaling, contractile cells, evolved independently and for other reasons long before being co-opted into anything resembling vision. The question "how could A and B arise simultaneously?" dissolves once neither A nor B was new.