logoalt Hacker News

monkeyelitelast Wednesday at 2:19 PM1 replyview on HN

You didn’t answer my question.

> I'd be much more worried about a home grown

This is rhetorical framing.

As a knuth reader you should know then when you look inside the box you find some other idiot’s home grown thing who never read the research.


Replies

cb321last Wednesday at 8:06 PM

> look at the major standard libraries for hash tables

I don't know what would count as "major", but at least to me "major" does not imply "good". As I mentioned, this idea is ancient (the point of citing Knuth from over half a century ago - that was me, not @dwattttt) but also weakly disseminated. In said weakness, it may well be uncommon in "major" impls, but that merely recapitulates my point. Honestly, that and the resizing thing was such weird argumentation that I didn't take the time to reply.

One does usually resize on insert to keep things sparse as @dwattttt mentioned. With deletions, might want to resize after delete to keep things dense which could help if you iterate over the table much after deleting a lot of it. That is not a cost you would ever pay if you don't delete. So, ability to delete is still something insert-only workloads don't pay anything for.

Moving past the off-point qualification of "major"-ness & weird resize points, if you are actually curious about more details how this can work and want a concrete example, you could look at the Nim stdlib Table implementation: https://github.com/nim-lang/Nim/blob/fbdc9a4c19aafc25937aaa5... or you could read the Knuth subsection that I referenced, but Knuth uses a rather archaic linear probing to smaller indices which probably seems unnatural to modern readers. There is nothing special added to the insert-only data/pathways to support deletes either in that Nim code or the Knuth.

Deletes done this way can benefit from saving integer hash codes (and that happens to be done in the Nim example), but so do all finds via prefix compare & resizes by avoiding hash(). So, in the space-time trade-off analysis of "is saving hash codes worth it?", delete finds & backshifts are just one more part of some workload. This is "deletes in a workload impact space-time trade-offs", though, not "ability to delete costs insert-only workloads" which seems to be the contentious point.