So how are we doing with whole program optimization on the compiler level? Feels kind of backwards that people are optimizing these LLM architectures, one at a time.