logoalt Hacker News

WarmWashtoday at 5:10 PM0 repliesview on HN

Even multimodal models are still really bad when it comes to vision. The strength is still definitely language.