You've given us some idea of the volume of your data but there's no mention of what's...

subhobroto • yesterday at 8:56 PM • 0 replies • view on HN

You've given us some idea of the volume of your data but there's no mention of what's ingesting it or how.

> during these stress tests the hardware is nowhere close to over-encumbered, and there's consistent headroom on both memory, CPU and disk I/O

This assertion is likely wrong - you're likely skipping over some metrics that has clues to what we need to know. Here are some questions to get the discussion moving.

- Is this PostgreSQL managed or self-hosted?

Your mention of "consistent headroom on both memory, CPU and disk I/O" gives me hope you're self-hosting it but I've heard the same thing in the past from people attempting to use RDS and wondering the same as you are, so no assumptions.

- Are you using COPY or multi-row INSERT statements?

- How much RAM does that server have?

- What is the fillfactor, max_wal_size and checkpoint_timeout?

- Is the WAL on NVMe?

- What's the iostat or wa during the slowdown?

- Are random UUIDs (part of) the index?

Have you posted to https://dba.stackexchange.com/

If I were you, I would create a GitHub repo that has scripts that synthesize the data and reproduce the issues you're seeing.

alt Hacker News