> First of all, as khuey pointed out, the current implementation accumulates values. extend() replaces values instead. It wouldn't achieve the same functionality.
Ah! Yes, I apologise. I missed the + in += and I'm not used to a hash table which defaults initialization for unseen entries (as the C++ hash tables all tend to because its native container types behave that way) so I wasn't looking for it.
The SipHash will be noticeably slower, no question about it, and so if you need to and know what you're paying you can replace the hash, including with integer_hasher which gives you what you'd likely know from many C++ stdlib implementations - an identity function presented as a hash.
> This is not necessary. The rust libraries are a port of Abseil, a C++ library.
More specifically HashBrown is a port [edited: actually a re-implementation I think, design->Rust not C++->Rust] of Abseil's Swiss Tables, and these days Rust's HashMap (and HashSet of course) use HashBrown but that's not what I was getting at here
I was thinking about analogues of Extend (because as I wrote above, I didn't notice that you're accumulating not overwriting) and modern C++ has this kind of feature in Ranges::to however it doesn't quite have Extend and as I said QoI is poor, there are often trivial optimisations that Rust does but the C++ means the same but isn't optimised.
I am interested in a quite different benchmark for hash tables, rather than merging I'm interested in very small hash tables. Clearly for two items it will be faster to try them both, and clearly for a million items trying them all is awful, so I measure a VecMap type (same API as a hash table but actually just the growable array of unordered key->value pairs, searched linearly) against HashMap and other implementations of this API.
For N=25 VecMap is still competitive, but even at N=5 if we use a very fast hash (such as that identity function) instead of SipHash we can beat VecMap for most operations. I suspect this sort of benchmark would fare very differently on older hardware (faster memory relative to ALU operations) and the direction of travel is likely to stay the same for the foreseeable future. In 1975 if you have six key->value pairs you don't want a hash table because it's too slow but in 2025 you probably do.