I was basing this more on the fact that you don't have to look at C code to understand that non cached transformer inference is going to be super slow.