logoalt Hacker News

daemonologistyesterday at 9:12 PM1 replyview on HN

At some point in late 2017 the paper was updated with this additional detail:

    Equal contribution. Listing order is random. Jakob proposed replacing RNNs with self-attention and started the effort to evaluate this idea. Ashish, with Illia, designed and implemented the first Transformer models and has been crucially involved in every aspect of this work. Noam proposed scaled dot-product attention, multi-head attention and the parameter-free position representation and became the other person involved in nearly every detail. Niki designed, implemented, tuned and evaluated countless model variants in our original codebase and tensor2tensor. Llion also experimented with novel model variants, was responsible for our initial codebase, and efficient inference and visualizations. Lukasz and Aidan spent countless long days designing various parts of and implementing tensor2tensor, replacing our earlier codebase, greatly improving results and massively accelerating our research.
In any case, if the authors considered their contributions equal, that's good enough for me.

Replies

tmuleyesterday at 10:31 PM

Thanks - wanted to point to this, and indeed should have worded my claim more precisely. And yes, am aware of prior work on attention. (I need to look it up, but I recall Noam saying publicly that he wouldn’t have agreed to random ordering of contributions if he knew this was going to be this big).