This is just a bit exciting, although I wonder how the performance of this will stack up next to the stuff we already do with, e.g., a metal-optimised model which we then load into llama-cpp or whatever. (unsloth is a good example of doing this for you "batteries included").