logoalt Hacker News

10000truthslast Friday at 11:38 PM2 repliesview on HN

For any compression algorithm in general, you keep track of A = {uncompressed bytes processed} and B = {compressed bytes processed} while decompressing, and bail out when either of the following occur:

1. A exceeds some unreasonable threshold

2. A/B exceeds some unreasonable threshold


Replies

integralidyesterday at 9:33 AM

In practice one of the things that happens very often is that you compress a file filled with null bytes. Such files compress extremely well, and would trigger your A/B threshold.

On the other hand, zip bomb described in this blog post relies on decompressing the same data multiple times - so it wouldn't trigger your A/B heuristics necessarily.

Finally, A just means "you can't compress more than X bytes with my file format", right? Not a desirable property to have. If deflate authors had this idea when they designed the algorithm, I bet files larger than "unreasonable" 16MB would be forbidden.

show 1 reply
nrhrjrjrjtntbtyesterday at 12:43 AM

Embarrsingly simple for a scanner too as you just mark as suspicious when this happens. You can be wrong sometimes and this is expected