> or do something entirely unrelated to my request (eg responding to "hey siri, how much is fourteen kilograms in pounds" by playing a song from my music library
My personal favourite is Siri responding to a request to open the garage door, a request it had successfully fielded hundreds of times before, by placing a call to the Tanzanian embassy. (I've never been to Tanzania. If I have a connection to it, it's unknown to me. The best I can come up with is Zanzibar sort of sounds like garage door.)
I'm amazed more AI tools don't have reality checks as part of the command flow. If you take a UX-first perspective on AI - which Apple very much should - there's going to be x% failures to interpret correctly, causing some unintended and undesirable action. A reasonable way to handle these failure cases is to have a post-interpretation reality check.
This could be personalized, 'does this user do this kind of thing?' which checks history of user actions for anything similar. Or it could be generic, 'is this the type of thing a typical user does?'
In both cases, if it's unfamiliar you have a few options: try to interpret it again (maybe with a better model), raise a prompt with the user ('do you want to do x?'), or if it's highly unfamiliar, auto cancel the command and say sorry.