This is just and example of "lying with statistics". Going by compute efficiency the gap has already closed (both in training and inference coincidentally).