It's only logical that this happens. Just because we can nowadays throw a massive amount of compute on a problem doesn't mean our models are good.
Why are people using transformers? Do they have any intuition that they could solve the challenge, let alone efficiently?
There's a tendency to treat transformers as a magic wand