Kind of interesting that LLMs are basically being sold as having “human-like” reasoning capabilities, but in this case when “obamawhitehouse” asked to have it’s password reset sent to [email protected] the LLM didn’t question it and just triggered the process that happened to have a bug.
Humans support agents certainly fall prey to social engineering all the time, but I can’t think of a case where it was done on this scale so easily.