logoalt Hacker News

oshrimptonyesterday at 8:55 AM0 repliesview on HN

Yeah the benchmark for sure isn't perfect and without super rigid prompting it is far too easy for it to get off course. 28% hallucination rate isn't nothing either