The trick might be to put a multimodal A.I. to describe what it sees in an image, and employ another...

emporas • yesterday at 11:43 PM • 0 replies • view on HN

The trick might be to put a multimodal A.I. to describe what it sees in an image, and employ another LLM to put the textual representation into code. Multimodal A.I.s are good at describing images.

Even a handwritten sketch could be a very good starting point for an image recognition from an A.I.

alt Hacker News