It will be interesting to compare this to https://news.ycombinator.com/item?id=47476422 and https://news.ycombinator.com/item?id=47490070 . Very similar design except that this is apparently using mmap, which according to the earlier experiment incurs significant overhead.
Except this isnt using heavily quantised versions of the model thus reducing quality.
It was written by an LLM, so... yeah.