The trick might be to put a multimodal A.I. to describe what it sees in an image, and employ another LLM to put the textual representation into code. Multimodal A.I.s are good at describing images.
Even a handwritten sketch could be a very good starting point for an image recognition from an A.I.