logoalt Hacker News

SQLite is all you need for durable workflows

90 pointsby tomasoltoday at 5:54 PM44 commentsview on HN

Comments

orftoday at 6:04 PM

> The caveat is that Litestream replication is asynchronous. A restore can miss the newest local writes if the SQLite volume disappears before they are copied. That is fine for many AI and experimentation workflows

In short: SQLite is not all you need, unless you’re just experimenting don’t actually care about durability, in which case you also need litestream + object storage.

Right.

show 5 replies
levkktoday at 6:54 PM

I don't understand this obsession with SQLite for real, production apps. SQLite is an embedded database, completely unsuitable for managing concurrency. This is what database _servers_ are for, e.g., Postgres, MySQL, etc. Their entire job is to allow you to modify data from multiple processes, on different machines, at the same time.

This is a foundational principle of computer science. It seems to me that the "SQLite for everything" crowd is a little bit inexperienced.

show 12 replies
bitexplodertoday at 6:17 PM

I started setting up my workflows using Temporal. It deploys as relatively light weight local app. For an isolated local installation it uses SQLite. It makes the process of dealing with API retries and organizing workflows and tasks really simple. I recommend giving it a try. It is, philosophically, exactly what this article is suggesting, but it adds an incredibly rich and flexible interface for agents to work with. Additionally, the web UI makes it very easy to inspect workflows, review agent execution, etc. Temporal also encodes much higher reliability into your system, almost for free. Distributed and reliable systems are hard, don't reinvent the wheel IMO.

If you find yourself wanting things like an easy way to then introspect your SQLite database, figure out what is happening in the workflow, compose individual tasks, make workflows trivially callable, etc, give Temporal a look.

Alongside this, I have mostly moved away from files for agents. Markdown and JSON are great, but also feel like traps when building out smaller local apps. LLMs are great at SQLite and you can render anything you want out of it (Markdown, JSON, etc). It saves a lot of tokens when an agent can just query a specific row instead of having to fire up jq or grep through markdown. You get a nice portable self contained data management system that encourages agents to be more disciplined about how they structure their data than a bunch of files. It also continues to scale into MySQL/Postgres if your little local projects start to outgrow or become more formal, you already have schema and discipline around data.

show 2 replies
shukantpaltoday at 6:47 PM

SQLite is surprisingly performant for single node applications even when comparing to Postgres. Postgres consumes a lot more memory and requires IO to hop through IPC whereas you can keep everything in process in SQLite with a shared connection pool.

I've been testing different storage engines for my agent harness and I can get up to 7.5k concurrent sessions on a single vCPU with SQLite whereas Postgres crashes or runs out connections.

[0] https://github.com/impalasys/talon/pull/23#issuecomment-4577...

show 2 replies
yokoprimetoday at 7:08 PM

If you're just doing workflows from a single node, i guess it can be ok as long as theres a single writer. But scaling across multiple servers it clearly is not all you need.

golem14today at 7:09 PM

Litestream releases 5.9 and newer have a bug that causes instances to sync an insane amount of data. a DB with <10K of data in it and practically no writes/reads causes something like 10GB of daily replication traffic. For my toy project that got needlessly expensive.

localhostertoday at 7:01 PM

Idk if this article was vibe written or the author just "got adjusted" but it's clearly is, and it's unreadable. Man this becomes anmoying

0x59today at 7:07 PM

Big complex data model with ambiguous query patterns? Postgres

Small, well defined, data model with known query patterns? Bespoke model

There probably is a place for sqlite and my project space so far hasn't yet well-aligned with it.

show 1 reply
Xceleratetoday at 6:05 PM

Haha, I just started doing this on my own. Found it helps the agents preserve state better. I typically ask them to design a DAG first based on a set of specifications and then execute it (each step stores something in a SQLite DB). Iteration is pretty simple then because I just ask for a tweak to one or two steps of the DAG, and then to re-run.

Funny how people are independently converging on similar patterns of "what works" here. Still feels like we're in the wild west with all these ad-hoc patterns of agent orchestration that people are coming up with.

kubik369today at 6:25 PM

Meta comment: This is a domain under my countries TLD (Slovakia) and it is one of the handful of words that are a word with the TLD in my language (and coincidentally) also in English. Every now and then, I will check on the domains with a retrograde dictionary for domains that have this property and root of this particular domain had a roundcube email server on it (can be checked on archive.org). After further checking, the local company actually named themselves Obeli s.r.o. (s.r.o. is Ltd), presumably so that they could use a domain that is a real word when said together with the TLD. (EDIT:) Forgot to write the thing I wanted to mention in the first place: it appears the domain must have lapsed and/or the author bought it from the company that was using it.

Another fascinating fact: our countries TLD has been stolen Ocean's 11 style (I am not kidding). After Czechoslovakia split into Czech Republic and Slovak Republic, the newly created Slovak .sk TLD has been under the care of people from the local university. The university also had some offices that they were leasing out. Someone had leased this office space (EDIT: this is important as this means they had the same physical address), created a company that had the same name as the NGO that was taking care of the domain, so e.g. the NGO was named "My Company o.z." and the perpetrator created a "My Company s.r.o." (our countries version of the american Ltd). This person then wrote to ICANN to change the address to the "My Company s.r.o." presumably under the pretense that this was just an administrative error and from this point, they have functionally taken custody of the TLD. I was not able to find how they did it technically, but I presume they persuaded ICANN to then point to their servers instead of the real ones. After this happened, it seems that no one noticed for some time. When they noticed, they tried taking it back, but they weren't able to. For some inexplicable reason, the government during that time (Šuster era, early 2000s) gave the new company a contract that was functionally uncancellable from the government side. Later governments made this even more uncancellable and in 2017, then Minister of IT (and as of this day president!) Pellegrini made the contract literally uncancellable. As a result of this, we have one of the most expensive domains around (18e/year, rising each year for no good reason). (EDIT:) The company running our countries TLD is now a foreign entity that the whole thing has been sold to (multiple owners over time) and we as a country have no control over if I understand it correctly.

I might have gotten some details wrong as I am writing this from my memory of researching it a couple of years back, but you get the idea, crazy stuff. Here is an article in Czech [0] that tells the story a bit better, but you have to translate it.

[0] https://www.root.cz/clanky/pribeh-domeny-sk-aneb-kradez-za-b...

// EDIT: I have found that the article actually links the movement to return the TLD back [1]. It also has a story tab [2], so they have something much more precise than the paraphrasing I wrote.

[1] https://www.nasadomena.sk/

[2] https://www.nasadomena.sk/historia/

netiktoday at 7:19 PM

Until you scale past one machine…

ChrisArchitecttoday at 7:20 PM

Related:

Building durable workflows on Postgres

https://news.ycombinator.com/item?id=48313530

sgloutnikovtoday at 6:14 PM

It's close enough that DBOS does support SQLite. [0] The default for prototyping is SQLite, but sure you can run it in production if you wanted.

Obligatory list of workflow engines and libraries because it's such a common need that a lot have rolled their own. [1]

[0] https://docs.dbos.dev/python/tutorials/database-connection

[1] https://github.com/meirwah/awesome-workflow-engines

EGregtoday at 6:03 PM

Files is all you need.

https://xkcd.com/378/

show 2 replies
steveharing1today at 7:03 PM

[dead]