The research from Metr, and my comment, is exclusively related to software development tasks.
Re-reading my comment, I realise I missed the most important part, the question.
What examples can you give of "real world situations" where they fail?
Obviously I don't want to use them for whatever that is.
Re-reading my comment, I realise I missed the most important part, the question.
What examples can you give of "real world situations" where they fail?
Obviously I don't want to use them for whatever that is.