Looks really interesting. A couple of questions: Can you explain how helix handles writes? What are you using for keys? UUIDs? I'm curious if you've done, or are thinking about, any optimizations here.
Feel free to point me to docs / code if these are lazy questions :)
We utilize some of LMDB's optimizations such as the APPEND put flags. We also make use of LMDB handling duplicates as a one-to-many key instead of duplicating keys. This means we can get all values for one key in one call rather than a call for each duplicate.
For keys we are using UUIDs, but using the v6 timestamped uuids so that they are easily lexicographically ordered at creation time. This means keys inserted into LMDB are inserted using the APPEND flag, meaning LMDB shortcuts to the rightmost leaf in its B-Tree (rather than starting at the root) and appends the new record. It can do this because the records are ordered by creation time meaning each new record is guaranteed to be larger (in terms of big-endian byte order) than the previous record.
We also store the UUIDs as u128 values for two reasons. The first is that a u128 takes up 16 bytes where as a string UUID takes up 36 bytes. This means we store 56% less data and LMDB has to decode 56% less bytes when doing code accesses.
For the outgoing/incoming edges for nodes, we store them as fixed sizes which means LMDB packs them in, removing the 8 byte header per Key-Value pair.
In the future, we are also going to separate the properties from the stored value as empty property objects still take up 8 bytes of space. We will also make it so nothing is inserted if the properties are empty.
You can see most of this in action in the storage core file: https://github.com/HelixDB/helix-db/blob/main/helixdb/src/he...