How do you get blob file writes fast?
I built lix [0] which stores AST’s instead of blobs.
Direct AST writing works for apps that are “ast aware”. And I can confirm, it works great.
But, the all software just writes bytes atm.
The binary -> parse -> diff is too slow.
The parse and diff step need to get out of the hot path. That semi defeats the idea of a VCS that stores ASTs though.
This is exactly a reason why weave stays on top of git instead of replacing storage. Parsing three file versions at merge time is fine (was about 5-67ms). Parsing on every read/write would be a different story. I know about Lix, but will check it out again.
I only diff the changed files. Producing blob out of BASON AST is trivial (one scan). Things may get slow for larger files, e.g. tree-sitter C++ parser is 25MB C file, 750KLoC. Takes couple seconds to import it. But it never changes, so no biggie.
There is room for improvement, but that is not a show-stopper so far. I plan round-tripping Linux kernel with full history, must show all the bottlenecks.
P.S. I checked lix. It uses a SQL database. That solves some things, but also creates an impedance mismatch. Must be x10 slow down at least. I use key-value and a custom binary format, so it works nice. Can go one level deeper still, use a custom storage engine, it will be even faster. Git is all custom.