logoalt Hacker News

tptacektoday at 5:24 PM2 repliesview on HN

I wish I'd had more space to write about the global orchestrator design, because it's fun.

The Fly Machines orchestrator goes through some trouble to keep the source of truth for each VM decentralized, owned by the physical it runs on. But there's still global state --- apps, organizations, services. That stuff is all on Postgres. Postgres keeps up with it just fine but I'd be lying if I didn't say we're always looking out the corner of our eyes on metrics.

The global state for Sprites is on object storage. Each organization gets a separate SQLite database, and that database is synchronized to object storage with Litestream.io (Lightstream is load bearing in a bunch of places here; solid as a rock for us).

I think people really still sleep on the "multiple SQLite database" backing store design.


Replies

chrismccordtoday at 6:05 PM

I've been working on the orchestrator side with Elixir and Phoenix, so happy to continue the discussion for curious minds. One of the coolest things we can do is things like this in Elixir - from any node we can reach out to a sqlite db across the planet:

OrgTracker.with_repo(org_id, fn ->

  repo.all(from sprite in "sprites", select: ...)
end)

That will find or place an Elixir process on the cluster and rpc the target node with our code. Placements can be sticky so they pin to a machine so we don't have to suck down the db every start, but we also balance out the load and handle failover of durable processes automatically. Combined with litestream, the result is distributed sqlite with failover while treating it essentially like a locally reachable sqlite db. Yes there is the speed of light to contend with, but by sending the execution across the wire rather than individual queries, we only ever pay a single hop to reach the process/sqlite.

underdeservertoday at 5:38 PM

For those who missed it, tptacek wrote TFA.