Even multimodal models are still really bad when it comes to vision. The strength is still definitel...

WarmWash • today at 5:10 PM • 0 replies • view on HN

Even multimodal models are still really bad when it comes to vision. The strength is still definitely language.

alt Hacker News