SQLite Is a Library of Congress Recommended Storage Format

238 points • by whatisabcdefgh • yesterday at 9:58 PM • 63 comments • view on HN

Comments

I'm always inspired by SQLite. Overall I like it, but if you're not doing writes it's really overkill.

So I made a format that will never surpass SQLite, except that it's extremely lighter and faster and works on zstd compressed files. It has really small indexes and can contain binaries or text just like SQLite.

The wasm part that decompresses and reads and searches the databases is only 38kb (uncompressed (maybe 16kb gzipped)). Compare that to SQLite's 1.2mb of wasm and glue code it's 3% the size but searching and loading is much faster. My program isn't really column based and isn't suitable for managing spreadsheets, but it's great for dictionaries and file archives of images and audio.

I ported the jbig2 decoder as a 17kb wasm module, so I can load monochrome scans that are 8kb per page and still legible.

https://github.com/tnelsond/peakslab

SQLite is very well engineered, PeakSlab is very simple.

➕ show 2 replies

alexpotato • today at 1:41 AM

I have always loved SQLite.

I have also heard that some firms ban its use.

Why?

Because it makes it SO easy to set up a database for your app that you end up with a super critical component of your application that looks exactly like a file. A file that can have any extension. And that file can be copied around to other servers. Even if there is PII in that file. Multiply this times the number of applications in your firm and you can see how this could get a little nuts.

DevOps and DBA teams would prefer that the database be a big, heavy iron thing that is very obviously a database server. And when you connect to it, that's also very obvious etc etc.

I still love SQLite though.

➕ show 4 replies

faangguyindia • today at 5:03 AM

I went from thinking “SQLite is a toy product, not reliable for real data" to "lets use SQLite for almost everything"

SQLite is very good if you can fit into the single writer, multiple readers pattern; you'll never lose data if you use the correct settings, which takes a minute of Google search to figure out.

Today, most of my apps are simply go binary + SQLite + systemd service file.

I've yet to lose data. Performance is great and plenty for most apps

➕ show 1 reply

rmunn • today at 3:42 AM

> As of this writing (2018-05-29) ...

So this news is nearly <del>six</del> EIGHT years old. But I didn't happen to know about it until now, so that's not a complaint at all; rather, this is a thank-you for posting it.

(Thanks for the correction. Brief brain malfunction in the math department there).

➕ show 2 replies

srcreigh • today at 2:11 AM

2026 recommended storage formats: https://www.loc.gov/preservation/resources/rfs/data.html

akihitot • today at 2:44 AM

For public-sector data preservation, it may be one of the best options.

The specification is publicly available

- It is widely adopted - It is likely to remain readable in the future - It has little dependency on specific operating systems or services - It carries low patent risk

From the perspective of long-term continuity, avoiding dependence on any particular company or service is extremely important.

➕ show 1 reply

afshinmeh • today at 5:48 AM

I love SQLite and thanks for sharing it but there should be a "(2018)" at the end in the title:

> As of this writing (2018-05-29) the only other recommended storage formats for datasets are XML, JSON, and CSV.

➕ show 1 reply

tombert • today at 4:50 AM

On a recent project I have needed to use exFAT. exFAT is terrible for a number of reasons, but in my case the thing I had to deal with was the lack of journaling, which had the possibility to corrupt files if there were a power interruption or something.

I initially was writing a series of files and doing some quasi-append-only things with new files and compacting the old one to sort of reinvent journaling. What I did more or less worked but it was very ad hoc and bad and was probably hiding a lot of bugs I would eventually have to fix later.

And then I remembered SQLite. I realized that ACID was probably safe enough for my needs, and then all the hard parts I was reinventing were probably faster and less likely to break if I used something thoroughly audited and tested, so I reworked everything I was doing to SQLite and it worked fine.

I wish exFAT would die in a fire and a journaling filesystem would replace it as the "one filesystem you can use everywhere", but until it does I'm grateful SQLite exists.

➕ show 2 replies

testermelon • today at 7:12 AM

I'm surprised they included proprietary format that's de facto standard in profession or supported by multiple tools (.xls, .xlsx) in preferred section [1]. I wonder if "well-known enough" is as good as "open" from preservation standpoint.

[1] https://www.loc.gov/preservation/resources/rfs/data.html

➕ show 2 replies

ray_v • today at 3:34 AM

It's so funny, because I was JUST telling a colleague of mine - another librarian - this exact fact about sqlite!

guelo • today at 8:14 AM

I get annoyed at all the other DBs that require their own heavy duty server process when for 90% of my projects there is only one client, my app server. Is there a DB that combines sqlite's embedded simplicity with higher concurrent write throughput?

arian_ • today at 6:11 AM

[flagged]

WindyBolt907 • today at 4:07 AM

[dead]

ksamantha • today at 5:17 AM

[flagged]

➕ show 1 reply

alt Hacker News

SQLite Is a Library of Congress Recommended Storage Format

Comments