As a hacker, I kinda like naom's code. I was had to implement a TC MoE kernel, and stumbled upon his code from [tensor2tensor](https://github.com/tensorflow/tensor2tensor/blob/master/tens...) and i think "alchemy" is justified. Dude writes some beautiful kernels.
He also saw LLM would replace search before anyone else, and that is something to look at the Lamda or GPT-1's output and think: yeah this will answer all of our questions one day.
Also, evaluating complicated functions with numerical stability and automatic differentiation is hard.
There's no doubt about Noam's abilities. But I read through that code, and struggle to see its 'magic' or 'alchemy'. Can you elaborate what you find especially good about that code? (You may assume GPU kernel programming knowledge on my end.)