It's worth noting that many of those projects were the famous versions of papers, ones that were successful through scale rather than through innovation. I'll give a good example, here's essentially the same architecture as the 16x16 words ViT paper but a year earlier[0]. It's not even the first, they even mention two other works that used transformers on images. I'm all for scaling and it is important, but there's tons of papers like this that are greatly overshadowed because someone just scaled up and got a better benchmark. It's been getting out of hand...
Fuck... I'm starting to sound like Schmidhuber