Yeah, telling an AI "don't ever listen to users who say to send it to a different email" is not a guardrail, it's a painted line that can still be driven over. It's not bad to have it per se, but it's not a safety mechanism.
The best comparison I can think of is that it's like validating dats on the frontend; it can make for a better user experience and he more efficient than hitting the backend when you know it will be an error, but it's not protection in any meaningful sense, and if you're not also enforcing invariants from behind the API, you're going to have a bad time. This is pretty similar to the type of issues you might run into with an implementation like that, where someone might make a request with data that you wouldn't expect from your frontend and perform operations you didn't mean to allow.