Lovely visualization. I like the very concrete depiction of middle layers "recognizing features", that make the whole machine feel more plausible. I'm also a fan of visualizing things, but I think its important to appreciate that some things (like 10,000 dimension vector as the input, or even a 100 dimension vector as an output) can't be concretely visualized, and you have to develop intuitions in more roundabout ways.
I hope make more of these, I'd love to see a transformer presented more clearly.