Me: Can you access my inbox and Teams messages?
Copilot: Yep!
Me: Please find any items in my inbox or sent items indicating (a) that I have agreed to take on a task or (b) identifying me as the person responsible for a task, removing duplicates and any items that I have unambiguously replied to via email or Teams. Time window is preceding 7 days.
Copilot: Prints a list with, at best, 5% accuracy
I know some folks have the peculiar idea that search is dead in favor of AI, but if AI can't accurately find information, it is useless. As near as I can tell, Copilot finds 3-4 items (but rarely the SAME 3-4 items across runs) and calls it a day. It just seems like nobody is actually testing any of this stuff. Microsoft is actively destroying its credibility because it's offering a tool with a party trick but is utterly unreliable. I will, therefore, not rely on it.
It's a generalization problem. We can train LLMs that 'know' a lot of stuff in the global sense but the tasks that are interesting to people require the LLM to know a lot about you and your world in a very specific sense. The technical problem is that it's all corner cases and that's impossible to scale right now. No amount of context window is going to get you there either.