We are using a service that abstracts redis from us and requires to be treated like a critical depen...

nakovet • today at 12:57 AM • 1 reply • view on HN

We are using a service that abstracts redis from us and requires to be treated like a critical dependency, think RDS, Aurora, Postgres, if they are down the whole site is down. Every job push is a call to this service. Upgrading the service = downtime.

For us this is resulted in a big weak point on our architecture because when the service reboots both job pushing and job pulling stops, with the pushing being on the API side bringing the API down. With containers we could have multiple of them running at the same time, but the shared reading/writing of the abstract Redis locks itself.

We are considering BullMQ, because the architecture is sane: * job push: API writes to Redis * job pull: Worker reads from Redis then writes the completion.

How do you see this issue for Bunqueue? What happens when it goes down for 5 minutes, can the jobs be enqueued? Can you run multiple instances of it, failover?

Our throughput (jobs/sec) is small we do have 100k+ scheduled jobs anywhere from minutes to months from now.

Replies

kernelvoid • today at 2:22 AM

Transparent answer about bunqueue's architecture.

Current state: bunqueue is single-server with SQLite persistence.

If the server goes down for 5 minutes, clients cannot push/pull during that window. However: the client SDK has automatic reconnection with exponential backoff + jitter, all data is safe on disk (SQLite WAL mode), and on restart active jobs are detected as stalled and re-queued automatically. Delayed/scheduled jobs resume from their run_at timestamps.

For your use case (100k+ scheduled jobs, low throughput): well-optimized. We use MinHeap + SQLite indexes for O(k) refresh where k = jobs becoming ready, not O(n) scan.

What bunqueue does NOT have today: no clustering, no multi-instance with shared state, no automatic failover, no replication.

What it does have: S3 automated backups (compressed, checksummed) for disaster recovery. A "durable: true" option for zero data loss on critical jobs. Zero external dependencies.

Roadmap: HA is something we're actively working toward. Native HA with leader election and replication. Managed cloud offering with automatic failover and geographic distribution.

Bottom line: if you need true HA today, BullMQ + Redis Sentinel/Cluster is the safer choice. bunqueue is for when you want simplicity, high performance (~100k jobs/sec), and can tolerate brief downtime with automatic recovery.

alt Hacker News

Replies