Why do you need non-trivial dependency on the object storage for the database for logs in the first ...

valyala • today at 10:11 AM • 2 replies • view on HN

Why do you need non-trivial dependency on the object storage for the database for logs in the first place?

Object storage has advantages over regular block storage if it is managed by cloud, and if it has a proven record on durability, availability and "infinite" storage space at low costs, such as S3 at Amazon or GCS at Google.

Object storage has zero advantages over regular block storage if you run it on yourself:

- It doesn't provide "infinite" storage space - you need to regularly monitor and manually add new physical storage to the object storage.

- It doesn't provide high durability and availability. It has lower availability comparing to a regular locally attached block storage because of the complicated coordination of the object storage state between storage nodes over network. It usually has lower durability than the object storage provided by cloud hosting. If some data is corrupted or lost on the underlying hardware storage, there are low chances it is properly and automatically recovered by DIY object storage.

- It is more expensive because of higher overhead (and, probably, half-baked replication) comparing to locally attached block storage.

- It is slower than locally attached block storage because of much higher network latency compared to the latency when accessing local storage. The latency difference is 1000x - 100ms at object storage vs 0.1ms at local block storage.

- It is much harder to configure, operate and troubleshoot than block storage.

So I'd recommend taking a look at other databases for logs, which do not require object storage for large-scale production setups. For example, VictoriaLogs. It scales to hundreds of terabytes of logs on a single node, and it can scale to petabytes of logs in cluster mode. Both modes are open source and free to use.

Disclaimer: I'm the core developer of VictoriaLogs.

Replies

lucideer • today at 12:55 PM

> Object storage has zero advantages over regular block storage if you run it on yourself

Worth adding, this depends on what's using your block storage / object storage. For Loki specifically, there are known edge-cases with large object counts on block storage (this isn't related to object size or disk space) - this obviously isn't something I've encountered & I probably never will, but they are documented.

For an application I had written myself, I can see clearly that block storage is going to trump object storage for all self-hosted usecases, but for 3P software I'm merely administering, I have less control over its quirks & those pros -vs- cons are much less clear cut.

lucideer • today at 10:15 AM

Initially I was just following recommendations blindly - I've never run Loki off-cloud before so my typical approach to learning a system would be to start with defaults & tweak/add/remove components as I learn it. Grafana's docs use object storage everywhere, so it's a lot easier with you're aligned, you can rely more heavily on config parity.

While I try to avoid complexity, idiomatic approaches have their advantages; it's always a trade-off.

That said my first instinct when I saw minio's status was to use filestorage but the rustfs setup has been pretty painless sofar. I might still remove it, we'll see.

alt Hacker News

Replies