And a (the?) solution is using an algorithm like k-means clustering to find the tileset of size k that can represent a given image the most faithfully. Of course that’s only for a single frame at a time.