Even with search grounding, it scored a 2.5/5 on a basic botanical benchmark. It would take muc...

ceroxylon • today at 4:39 PM • 1 reply • view on HN

Even with search grounding, it scored a 2.5/5 on a basic botanical benchmark. It would take much longer for the average human to do a similar write-up, but they would likely do better than 50% hallucination if they had access to a search engine.

Replies

WarmWash • today at 5:10 PM

Even multimodal models are still really bad when it comes to vision. The strength is still definitely language.

alt Hacker News

Replies