logoalt Hacker News

AlotOfReadingyesterday at 1:41 AM0 repliesview on HN

This easier to solve if you use a permutation instead of hash functions.

Let h0 be the larger table, and h1 the smaller. N = len(h0), M = len(h1). Pretend the elements of the tables are sequentially indexed. Element[0] is h0[0], Element[N] is h1[0], etc.

     h0' = Resize(h0, (N+M)*capacity_factor)

    for x in 0...(range):
      y = permute(x, 0, (N+M)*capacity_factor)
      if(y >= N) move_to(h0'[y], element[x])
One allocation and you move the minimum number of elements needed to eliminate primary clustering. Elements in h0 that aren't moved would presumably remain correctly indexed. You have to move the remaining elements of h1 as well, but that cluttered things.

Any randomish permutation works, from basic ones up to cryptographic. If your permutation only works on certain powers of two, iterate it until the result is in range.