If they wanted couldn't they do something like RLHF? Instead of humans picking the best of 2 text outputs, they pick the best rendered design
I'd be very surprised if they're not already doing this.
I'd be very surprised if they're not already doing this.