> In practice one of the things that happens very often is that you compress a file filled with n...

10000truths • yesterday at 11:35 AM • 0 replies • view on HN

> In practice one of the things that happens very often is that you compress a file filled with null bytes. Such files compress extremely well, and would trigger your A/B threshold.

Sure, if you expect to decompress files with high compression ratios, then you'll want to adjust your knobs accordingly.

> On the other hand, zip bomb described in this blog post relies on decompressing the same data multiple times - so it wouldn't trigger your A/B heuristics necessarily.

If you decompress the same data multiple times, then you increment A multiple times. The accounting still works regardless of whether the data is same or different. Perhaps a better description of A and B in my post would be {number of decompressed bytes written} and {number of compressed bytes read}, respectively.

> Finally, A just means "you can't compress more than X bytes with my file format", right? Not a desirable property to have. If deflate authors had this idea when they designed the algorithm, I bet files larger than "unreasonable" 16MB would be forbidden.

The limitation is imposed by the application, not by the codec itself. The application doing the decompression is supposed to process the input incrementally (in the case of DEFLATE, reading one block at a time and inflating it), updating A and B on each iteration, and aborting if a threshold is violated.

alt Hacker News