logoalt Hacker News

ceroxylontoday at 4:39 PM1 replyview on HN

Even with search grounding, it scored a 2.5/5 on a basic botanical benchmark. It would take much longer for the average human to do a similar write-up, but they would likely do better than 50% hallucination if they had access to a search engine.


Replies

WarmWashtoday at 5:10 PM

Even multimodal models are still really bad when it comes to vision. The strength is still definitely language.