I've had this idea of building a codec that would similarly overfit to specific images. But the codec itself would not be a fixed size transformer... instead you could just mess around with the sizing to get better quality/smaller size.
So the codec would be something like: <header describing image size + transformer layer shape> <transformer data itself>
I've seen experiments where people have a "fixed" pipeline but I think having something more dynamic would work quite well.
Likely doable with metaparameter tuning (used to work on a team with data scientists that were routinely doing this in various situations). Seems like a cool idea.