For those interested, Wired ran a backstory about the Attention is All You Need paper 2 years ago: https://www.wired.com/story/eight-google-employees-invented-...
It gives some context on the contributions of each of the authors. About Shazeer, from the article:
Shazeer’s joining the group was critical. “These theoretical or intuitive mechanisms, like self-attention, always require very careful implementation, often by a small number of experienced ‘magicians,’ to even show any signs of life,” says Uszkoreit. Shazeer began to work his sorcery right away. He decided to write his own version of the transformer team’s code. “I took the basic idea and made the thing up myself,” he says. Occasionally he asked Kaiser questions, but mostly, he says, he “just acted on it for a while and came back and said, ‘Look, it works.’” Using what team members would later describe with words like “magic” and “alchemy” and “bells and whistles,” he had taken the system to a new level.
> Using what team members would later describe with words like “magic” and “alchemy” and “bells and whistles,”
Ok, these peopl have all gotten extensive training on how to hype for the non-technical crowd without saying anything of substance.