logoalt Hacker News

xorvoidtoday at 12:44 PM1 replyview on HN

Thank you Michael Rabin for your excellent work. Rest in Peace.

Rabin Fingerprinting is one of my favorites of his contributions. It's a "rolling hash" that allows you to quickly compute a 32-bit (or larger) hash at *every* byte offset of a file. It is used most notably to do file block matching/deduplication when those matching blocks can be at any offset. It's tragically underappreciated.

I've been meaning to write up a tutorial as part of my Galois Field series. Someday..

Thank you again!


Replies

jonhohletoday at 1:28 PM

I recently found his fingerprint algorithm and wrote a utility that uses it to find duplicate MIPS code for decompilation[0] and build unique identifiers that can be used to find duplicates without sharing any potentially copyrighted data[1].

This replaced some O(n²) searches through ASCII text, reducing search time from dozens of seconds to fractions of a second.

0 - https://github.com/ttkb-oss/mipsmatch 1 - https://github.com/ttkb-oss/mipsmatch/wiki/Identifiers