logoalt Hacker News

samuelstrostoday at 7:03 AM2 repliesview on HN

How do you get blob file writes fast?

I built lix [0] which stores AST’s instead of blobs.

Direct AST writing works for apps that are “ast aware”. And I can confirm, it works great.

But, the all software just writes bytes atm.

The binary -> parse -> diff is too slow.

The parse and diff step need to get out of the hot path. That semi defeats the idea of a VCS that stores ASTs though.

[0] https://github.com/opral/lix


Replies

gritzkotoday at 7:42 AM

I only diff the changed files. Producing blob out of BASON AST is trivial (one scan). Things may get slow for larger files, e.g. tree-sitter C++ parser is 25MB C file, 750KLoC. Takes couple seconds to import it. But it never changes, so no biggie.

There is room for improvement, but that is not a show-stopper so far. I plan round-tripping Linux kernel with full history, must show all the bottlenecks.

P.S. I checked lix. It uses a SQL database. That solves some things, but also creates an impedance mismatch. Must be x10 slow down at least. I use key-value and a custom binary format, so it works nice. Can go one level deeper still, use a custom storage engine, it will be even faster. Git is all custom.

rs545837today at 7:11 AM

This is exactly a reason why weave stays on top of git instead of replacing storage. Parsing three file versions at merge time is fine (was about 5-67ms). Parsing on every read/write would be a different story. I know about Lix, but will check it out again.