Sixel came earlier, and already fulfilled the basic requirement of "put pixels on screen in a single well-defined format" (something not even iTerm2's protocol does.)
Kitty is a lot more complex: it accepts five different encodings, has three different ways to load the data, supports animations, etc. So it's no wonder only a few terminal developers had time to implement it.
See also: https://github.com/veltza/st-sx/issues/1#issuecomment-190272... 5000 lines (Kitty) vs 1000 lines (Sixel) even though the Kitty patch is just a "subset".