logoalt Hacker News

magicalhippotoday at 1:59 AM0 repliesview on HN

JPEG and friends transforms the image data into the frequency domain. Regular old JPEG uses the discrete cosine transformation[1] for this on 8x8 blocks of pixels. This is why with heavily compressed JPEG images you can see blocky artifacts[2]. JPEG XL uses variable block size DCT.

Lets stick to old JPEG as it's easier to explain. The DCT takes the 8x8 pixels of a block and transforms it to 8x8 magnitudes of different frequency components. In one corner you have the DC component, ie zero frequency, which represents the average of all 8x8 pixels. Around it you have the lowest non-zero frequency components. You have three of those, one which has a non-zero x frequency, one with a non-zero y frequency, and one where both x and y are non-zero. The elements next to those are the next-higher frequency components.

To reconstruct the 8x8 pixels, you run the inverse discrete cosine transformation, which is lossless (to within rounding errors).

However, due to Nyquist[3], you don't need those higher-frequency components if you want a lower-resolution image. So if you instead strip away the highest-frequency components so you're left with a 7x7 block, you can run the inverse transform on that to get a 7x7 block of pixels which perfectly represents a 7/8 = 87.5% sized version of the original 8x8 block. And you can do this for each block in the image to get a 87.5% sized image.

Now, the pyramidal scheme takes advantage of this by rearranging how the elements in each transformed block is stored. First it stores the DC components of all the blocks the image. If you just used those, you'd get an image which perfectly represents a 1/8th-sized image.

Next it stores all the lowest-frequency components for all the blocks. Using the DC and those you have effectively 2x2 blocks, and can perfectly reconstruct a quarter-sized image.

Now, if the decoder knows the target size the image will be displayed at, it can then just stop reading when it has sufficiently large blocks to reconstruct the image near the target size.

Note that most good old JPEG decoders supports this already, however since the blocks are stored one after another it still requires reading the entire file from disk. If you have a fast disk and not too large images it can often be a win regardless. But if you have huge images which are often not used in their full resolution, then the pyramidal scheme is better.

[1]: https://en.wikipedia.org/wiki/Discrete_cosine_transform

[2]: https://eyy.co/tools/artifact-generator/ (artifact intensity 80 or above)

[3]: https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampli...