Wow. I’ve used NATS for best-effort in-memory pub/sub, which it has been great for, including getting subtle scaling details right. I never touched their persistence and would have investigated more before I did, but I wouldn’t have expected it to be this bad. Vulnerability to simple single-bit file corruption is embarrassing.
Sort of related. Jepsen and Antithesis recently released a glossary of common terms which is a fantastic reference.
> 3.4 Lazy fsync by Default
Why? Why do some databases do that? To have better performance in benchmarks? It’s not like that it’s ok to do that if you have a better default or at least write a lot about it. But especially when you run stuff in a small cluster you get bitten by stuff like that.
Curious about the differences between content on aphyr.com/tags/jepsen and jepsen.io/analyses. I recently discovered aphyr.com and was excited about the potential insights!
> By default, NATS only flushes data to disk every two minutes, but acknowledges operations immediately. This approach can lead to the loss of committed writes when several nodes experience a power failure, kernel crash, or hardware fault concurrently—or in rapid succession (#7564).
I am getting strong early MongoDB vibes. "Look how fast it is, it's web-scale!". Well, if you don't fsync, you'll go fast, but you'll go even faster piping customer data to /dev/null, too.
Coordinated failures shouldn't be a novelty or a surprise any longer these days.
I wouldn't trust a product that doesn't default to safest options. It's fine to provide relaxed modes of consistency and durability but just don't make them default. Let the user configure those themselves.
If you are looking for a serverless alternative to JetStream, check out https://s2.dev
Pros: unlimited streams with the durability of object storage – JetStream can only do a few K topics
Cons: no consumer groups yet, it's on the agenda
> > You can force an fsync after each messsage [sic] with always, this will slow down the throughput to a few hundred msg/s.
Is the performance warning in the NATS possible to improve on? Couldn't you still run fsync on an interval and queue up a certain number of writes to be flushed at once? I could imagine latency suffering, but batches throughput could be preserved to some extent?
Half-expected tbh, but didn’t expect to be this bad.
Just use redpanda.
NATS is a fantastic piece of software. But doc’s unpractical and half backed. That’s a shame to be required to retro engineer the software from GitHub to know the auth schemes.
nats jetstream vs say redis streams - which one have people found easier to work with ?
Thanks, those reports are always a quiet pleasure to read even if one is a bit far from the domain.
Definitely thought this was about aviation for a moment.
Every time someone builds one of these things and skips over "overcomplicated theory", aphyr destroys them. At this point, I wonder if we could train an AI to look over a project's documentation, and predict whether it's likely to lose commmitted writes just based on the marketing / technical claims. We probably can.