logoalt Hacker News

iLoveOncalltoday at 9:03 AM1 replyview on HN

> current models have almost 100% success rate on tasks taking humans less than 4 minutes

The contrary is easily verifiable by everyone individually. It's nowhere near 100%, or even 50% for few minutes tasks even with the best models in real world situations.


Replies

ben_wtoday at 11:32 AM

I've only noticed that combination (failure of short everyday tasks from SOTA models) on image comprehension, not text.

So some model will misclassify my American black nightshade* weeds as a tomato, but I get consistently OK results for text out from good models unless it's a trick question.

* I recon, at least; looked like this to me: https://en.wikipedia.org/wiki/Solanum_americanum#/media/File...