Maybe the article originally featured a 1000-line C implementation.
I was basing this more on the fact that you don't have to look at C code to understand that non cached transformer inference is going to be super slow.
I don't see how that would be possible given the contents of the article.
I was basing this more on the fact that you don't have to look at C code to understand that non cached transformer inference is going to be super slow.