logoalt Hacker News

Aardwolfyesterday at 4:44 PM3 repliesview on HN

> The table occupies at most 32 GiB of memory.

This constraint allows making a linear array of all the 4 billion values, with the key as array index, which fits in 16 GiB. Another 500 MiB is enough to have a bit indicating present or not for each.

Perhaps text strings as keys and values would give a more interesting example...


Replies

reyesterday at 10:02 PM

> a linear array of all the 4 billion values, with the key as array index, which fits in 16 GiB

The hash table has the significant advantage of having a much smaller minimum size.

> Perhaps text strings as keys and values would give a more interesting example

Keep reading to "If keys and values are larger than 32 bits"

kazinatortoday at 12:59 AM

That's not a constraint as much as the worst case size.

If you actually only have a handful of entries in the table, it is measurable in bytes.

A linear array of all 4 billion possible values will occupy 16 GiB (of virtual memory) upfront. We have then essentially replaced the hash table with a radix tree --- that made up of the page directories and tables of the VM system. If only the highest and lowest value are present in the table, then we only need the highest and lowest page (4 kB or whatever) to be mapped. It's not very compact for small sets; storing N numbers in random locations could require as many as N virtual memory pages to be committed.

dragontameryesterday at 8:50 PM

This hashtable implements a multiset. Not (merely) a simple set.