I agree this would be a good use of an LLM (assuming that it was running locally). I wouldn't put one in charge of deleting my messages, but I could see one being used to assign a score to messages and based on that score moving them out of my inbox into various folders for review.
I'd be really interested to see a comparison between LLM spam scoring and a traditional spam scoring algorithm because an LLM is essentially a spam generator. Can that be used to make a better spam detector?