logoalt Hacker News

derefrlast Saturday at 7:22 PM0 repliesview on HN

Question: would it be possible to invert the problem? I.e., rather than decreasing the size of the RAG — use the RAG to compress everything other than the RAG index itself.

E.g., design a filesystem so that the RAG index is part of / managed internally within the metadata of the filesystem itself; and then, for each FS inode data-extent, give it two polymorphic on-disk representations:

1. extents hold raw data; rag-vectors are derivatives and updated after extent is updated (as today)

2. rag-vectors are canonical; extents hold residuals from a predictive-coding model that took the rag-vectors as input and tried to regenerate the raw data of the extent. When extent is read [or partially overwritten], use predictive-coding model to generate data from vectors and then repair it with residue (as in modern video-codec p-frame generation.)

———

Of course, even if this did work (in the sense of providing a meaningful decrease in storage use), this storage model would only really be practical for document files that are read entirely on open and atomically overwritten/updated (think Word and Excel docs, PDFs, PSDs, etc), not for files meant to be streamed.

But, luckily, the types of files this technique are amenable to are exactly the same types of files that a “user’s documents” RAG would have any hope of indexing in the first place!