I wonder why Gmail and other email providers don't just run an LLM/ML pipeline to detect phishing emails. It seems that matching an email's content with the sender's domain (and possibly analyzing the content behind links) would be enough to show, with high certainty, a warning like "Beware: this looks like a phishing email." Is it too expensive? Too many false positives?
They do - well sort of.
The most essential check is SPF and DKIM which authenticate if the message has come from an authorized server. The problem is that most mail services are too lenient with mismatched sender identification. On one hand, people would be quite vocal about their mail provider sending way too much legitimate (but slightly misconfigured) mail to the spam folder. However it allows situations like to happen where the FROM header, the "From:" address, and the return path are all different.
Most mail systems have several stages of filters, and the first ones (checking authentication) are quite basic. After that, attachments, links, and contents are checked for known malware. Machine learning might kick in after this, if certain criteria are met. Mail security is very complicated and works well except for the times it falls flat on its face like this.
https://en.wikipedia.org/wiki/Sender_Policy_Framework https://en.wikipedia.org/wiki/DomainKeys_Identified_Mail
To my understanding, they already do use some form of ML for this and it's part of how things get routed to the spam folder without explicit rules.
>LLM/ML pipeline to detect phishing emails.
I think you're about 20 years behind the times if you think they don't.
There are a whole lot of problems with it when you start pressing the finer details like you list. For example, just look at the legit emails banks send out. They will tell you not to click links claiming to be your bank, then include links (claiming to be your bank) for more information.
Simply put the rules block too much corporate email because people that write corporate email do lots of dumb things with the email system.