logoalt Hacker News

ben_wtoday at 11:32 AM1 replyview on HN

I've only noticed that combination (failure of short everyday tasks from SOTA models) on image comprehension, not text.

So some model will misclassify my American black nightshade* weeds as a tomato, but I get consistently OK results for text out from good models unless it's a trick question.

* I recon, at least; looked like this to me: https://en.wikipedia.org/wiki/Solanum_americanum#/media/File...


Replies

iLoveOncalltoday at 4:02 PM

The research from Metr, and my comment, is exclusively related to software development tasks.

show 1 reply