I don't disagree given your "most" qualifier, but there's a case where every level of hardware would benefit: compression of textures generated at runtime, either via procgen or for e.g. environment maps.
This is in a frustrating state at the moment. CPU compression is way too slow. Some people have demoed on-the-fly GPU compression using a compute shader, but annoyingly there is (or at least was at the time) no way in the GPU APIs to `reinterpret_cast` the compute output as a compressed texture input. Meaning the whole thing had to be dragged down to CPU memory and uploaded again.
Agreed.
we hit some wired case on Adreno 530, ran into bizarre GPU instruction set issues with the compute shader compressor, that only manifested on Adreno 53x. Ended up having to add a device detection path, and fall back to CPU compression. which defeated much of the point.