Surely a 50% warning alarm on disk usage covers this without manual intervention?
> Surely a 50% warning alarm on disk usage covers this without manual intervention?
surely you don't need a fire extinguisher in your kitchen, if you have a smoke detector?
a "warning alarm" is a terrible concept, in general. it's a perfect way to lead to alert fatigue.
over time, you're likely to have someone silence the alarm because there's some host sitting at 57% disk usage for totally normal reasons and they're tired of getting spammed about it.
even well-tuned alert rules (ones that predict growth over time rather than only looking at the current value) tend to be targeted towards catching relatively "slow" leaks of disk usage.
there is always the possibility for a "fast" disk space consumer to fill up the disk more quickly than your alerting system can bring it to your attention and you can fix it. at the extreme end, for example, a standard EBS volume has a throughput of 125mb/sec. something that saturates that limit will fill up 10gb of free space in 80 seconds.
If the alarms are reliably configured, confirmed to be working, low noise enough to be actioned, etc etc.
And of course there's nothing to say that both of these things can't be done simultaneously.
You don't want an alarm on a usage threshold, you want a linear regression that predicts when utilization will cross a threshold. Then you set your alarms for "How long does it take me to remediate this condition?"
If the alarm works. And it actioned not just snoozed too much or just dismissed entirely.
Defence in depth is a good idea: proper alarms, and a secondary measure in case they don't have the intended effect.
Depends. A Kubernetes container might have only a few megabytes of disk space, because it shouldn't need it.
Except that one time when .NET decides that the incoming POST is over some magic limit and it doesn't do the processing in-memory like before, but instead has to write it to disk, crashing the whole pod. Fun times.
Also my Unraid NAS has two drives in "WARNING! 98% USED" alert state. One has 200GB of free space, the other 330GB. Percentages in integers don't work when the starting number is too big :)