Vm.overcommit_memory=2 is the right setting for servers

91 points • by signa11 • last Wednesday at 10:45 AM • 126 comments • view on HN

Comments

An aircraft company discovered that it was cheaper to fly its planes with less fuel on board. The planes would be lighter and use less fuel and money was saved. On rare occasions however the amount of fuel was insufficient, and the plane would crash. This problem was solved by the engineers of the company by the development of a special OOF (out-of-fuel) mechanism. In emergency cases a passenger was selected and thrown out of the plane. (When necessary, the procedure was repeated.) A large body of theory was developed and many publications were devoted to the problem of properly selecting the victim to be ejected. Should the victim be chosen at random? Or should one choose the heaviest person? Or the oldest? Should passengers pay in order not to be ejected, so that the victim would be the poorest on board? And if for example the heaviest person was chosen, should there be a special exception in case that was the pilot? Should first class passengers be exempted? Now that the OOF mechanism existed, it would be activated every now and then, and eject passengers even when there was no fuel shortage. The engineers are still studying precisely how this malfunction is caused.

https://lwn.net/Articles/104185/

➕ show 2 replies

LordGrey • last Wednesday at 11:09 AM

For anyone not familiar with the meaning of '2' in this context:

The Linux kernel supports the following overcommit handling modes

0 - Heuristic overcommit handling. Obvious overcommits of address space are refused. Used for a typical system. It ensures a seriously wild allocation fails while allowing overcommit to reduce swap usage. root is allowed to allocate slightly more memory in this mode. This is the default.

1 - Always overcommit. Appropriate for some scientific applications. Classic example is code using sparse arrays and just relying on the virtual memory consisting almost entirely of zero pages.

2 - Don't overcommit. The total address space commit for the system is not permitted to exceed swap + a configurable amount (default is 50%) of physical RAM. Depending on the amount you use, in most situations this means a process will not be killed while accessing pages but will receive errors on memory allocation as appropriate. Useful for applications that want to guarantee their memory allocations will be available in the future without having to initialize every page.

➕ show 2 replies

kentonv • today at 3:35 AM

When your system is out of memory, you do not want to return an error to the next process that allocates memory. That might be an important process, it might have nothing to do with the reason the system is out of memory, and it might not be able to gracefully handle allocation failure (realistically, most programs can't).

Instead, you want to kill the process that's hogging all the memory.

The OOM killer heuristic is not perfect, but it will generally avoid killing critical processes and is fairly good at identifying memory hogs.

And if you agree that using the OOM killer is better than returning failure to a random unlucky process, then there's no reason not to use overcommit.

Besides, overcommit is useful. Virtual-memory-based copy-on-write, allocate-on-write, sparse arrays, etc. are all useful and widely-used.

c0l0 • yesterday at 8:13 PM

I realize this is mostly tangential to the article, but a word of warning for those who are about to mess with overcommit for the first time: In my experience, the extreme stance of "always do [thing] with overcommit" is just not defensible, because most (yes, also "server") software is just not written under the assumption that being able to deal with allocation failures in a meaningful way is a necessity. At best, there's an "malloc() or die"-like stanza in the source, and that's that.

You can and maybe even should disable overcommit this way when running postgres on the server (and only a minimum of what you would these days call sidecar processes (monitoring and backup agents, etc.) on the same host/kernel), but once you have a typical zoo of stuff using dynamic languages living there, you WILL blow someone's leg off.

➕ show 2 replies

EdiX • yesterday at 8:54 PM

This is completely wrong. First, disabling overcommit is wasteful because of fork and because of the way thread stacks are allocated. Sorry, you don't get exact memory accounting with C, not even Windows will do exact accounting of thread stacks.

Secondly, memory is a global resource so you don't get local failures when it's exhausted, whoever allocates first after memory has been exhausted will get an error they might be the application responsible for the exhaustion or they might not be. They might crash on the error or they might "handle it", keep going and render the system completely unusable.

No, exact accounting is not a solution. Ulimits and configuring the OOM killer are solutions.

➕ show 2 replies

ris • today at 10:55 AM

This rules out some extremely useful sparse memory tricks you can pull with massive mmaps that only ever get partially accessed (in unpredictable patterns).

wmf • last Wednesday at 5:51 PM

This doesn't address the fact that forking large processes requires either overcommit or a lot of swap. That may be the source of the Redis problem.

➕ show 2 replies

vin10 • yesterday at 8:33 PM

For anyone feeling brave enough to disable overcommit after reading this, be mindful that default `vm.overcommit_ratio` is 50% which means that if no swap is available, on a system with 2GB of total RAM, more than 1GB of RAM can't be allocated and requests will fail with preemptive OOMs. (e.g. postgresql servers typically disable overcommit)

- https://github.com/torvalds/linux/blob/master/mm/util.c#L753

laurencerowe • yesterday at 8:49 PM

Disabling overcommit on V8 servers like Deno will be incredibly inefficient. Your process might only need ~100MB of memory or so but V8's cppgc caged heap requires a 64GB allocation in order to get a 32GB aligned area in which to contain its pointers. This is a security measure to prevent any possibility of out of cage access.

➕ show 1 reply

Skunkleton • today at 5:47 AM

Over commit is a design choice, and it is a design choice that is pretty core to Linux. Basic stuff like fork(), for example, gets wasteful when you don't over commit. Less obvious stuff like buffer caches also get less effective. There are certainly places where you would rather fail at allocation time, but that isn't everywhere and it doesn't belong as a default.

barchar • today at 3:04 AM

Fwiw you can use pressure stall information to load shed. This is superior to disabling overcommit and then praying the first allocation to fail is in the process you want to actually respond to the resource starvation.

Fact is that by the time small allocations are failing you are almost no better off handling the null than you would be handling segfaults and the sigterms from the killer.

Often for servers performance will fall off a cliff long before the oom killer is needed, too.

charcircuit • yesterday at 8:54 PM

>Would you rather debug a crash at the allocation site

The allocation site is not necessarily what is leaking memory. What you actually want in either case is a memory dump where you can tell what is leaking or using the memory.

Asmod4n • today at 7:34 AM

There are some situations where you can somewhat handle malloc returning NULL.

One would be where you have frequent large mallocs which get freed fast. Another would be where you have written a garbage collected language in C/C++.

When calling free, delete or letting your GC do that for you the memory isn't actually given back immediately, glibc has malloc_trim(0) for that, which tries it's best to give back as much unused memory to the OS as possible.

Then you can retry your call to malloc and see if it fails and then just let your supervisor restart your service/host/whatever or not.

simscitizen • today at 5:26 AM

There's already a popular OS that disables overcommit by default (Windows). The problem with this is that disallowing overcommit (especially with software that doesn't expect that) can mean you don't get anywhere close to actually using all the RAM that's installed on your system.

➕ show 2 replies

Animats • yesterday at 8:13 PM

Setting 2 is still pretty generous. It means "Kernel does not allow allocations that exceed swap + (RAM × overcommit_ratio / 100)." It's not a "never swap or overcommit" setting. You can still get into thrashing by memory overload.

We may be entering an era when everyone in computing has to get serious about resource consumption. NVidia says GPUs are going to get more expensive for the next five years. DRAM prices are way up, and Samsung says it's not getting better for the next few years. Bulk electricity prices are up due to all those AI data centers. We have to assume for planning purposes that computing gets a little more expensive each year through at least 2030.

Somebody may make a breakthrough, but there's nothing in the fab pipeline likely to pay off before 2030, if then.

➕ show 1 reply

deathanatos • yesterday at 8:31 PM

This is quite the bold statement to make with RAM prices sky high.

I want to agree with the locality of errors argument, and while in simple cases, yes, it holds true, it isn't necessarily true. If we don't overcommit, the allocation that kills us is simply the one that fails. Whether this allocation is the problematic one is a different question: if we have a slow leak that, every 10k allocation allocs and leaks, we're probably (9999 / 10k, assuming spherical allocations) going to fail on one that isn't the problem. We get about as much info as the oom-killer would have, anyways: this program is allocating too much.

jleyank • last Thursday at 12:43 PM

As I recall, this appeared in the 90’s and it was a real pain debugging then as well. Having errors deferred added a Heisenbug component to what should have been a quick, clean crash.

Has malloc ever returned zero since then? Or has somebody undone this, erm, feature at times?

➕ show 1 reply

blibble • yesterday at 8:14 PM

redis uses the copy-on-write property of fork() to implement saving

which is elegant and completely legitimate

➕ show 2 replies

renehsz • last Wednesday at 4:42 AM

Strongly agree with this article. It highlights really well why overcommit is so harmful.

Memory overcommit means that once you run out of physical memory, the OOM killer will forcefully terminate your processes with no way to handle the error. This is fundamentally incompatible with the goal of writing robust and stable software which should handle out-of-memory situations gracefully.

But it feels like a lost cause these days...

So much software breaks once you turn off overcommit, even in situations where you're nowhere close to running out of physical memory.

What's not helping the situation is the fact that the kernel has no good page allocation API that differentiates between reserving and committing memory. Large virtual memory buffers that aren't fully committed can be very useful in certain situations. But it should be something a program has to ask for, not the default behavior.

➕ show 5 replies

pizlonator • yesterday at 8:42 PM

This is such an old debate. The real answer, as with all such things, is "it depends".

Two reasons why overcommit is a good idea:

- It lets you reserve memory and use the dirtying of that memory to be the thing that commits it. Some algorithms and data structures rely on this strongly (i.e. you would have to use a significantly different algorithm, which is demonstrably slower or more memory intensive, if you couldn't rely on overcommit).

- Many applications have no story for out-of-memory other halting. You can scream and yell at them to do better, but that won't help, because those apps that find themselves in that supposedly-bad situation ended up there for complex and well-considered reasons. My favorite: having complex OOM error handling paths is the worst kind of attack surface, since it's hard to get test coverage for it. So, it's better to just have the program killed instead, because that nixes the untested code path. For those programs, there's zero value in having the memory allocator be able to report OOM conditions other than by asserting in prod that mmap/madvise always succeed, which then means that the value of not overcommitting is much smaller.

Are there server apps where the value of gracefully handling out of memory errors outweighs the perf benefits of overcommit and the attack surface mitigation of halting on OOM? Yeah! But I bet that not all server apps fall into that bucket

➕ show 1 reply

PunchyHamster • today at 2:35 AM

Sure if you don't like your stuff to work well. 0 is default for a reason, and "my specific workload is buggy with 0" is not a problem with it, just the reason there are other options

Advertising for 2 with "but apps should handle it" is utter ignorance, and redis example shows that, the database is using the COW fork feature for basically the reason it exists, as do many, many servers and the warning is pretty much tailored for people thinking they are clever and not understanding memory subsystem

jcalvinowens • yesterday at 8:19 PM

There's a reason nobody does this: RAM is expensive. Disabling overcommit on your typical server workload will waste a great deal of it. TFA completely ignores this.

This is one of those classic money vs idealism things. In my experience, the money always wins this particular argument: nobody is going to buy more RAM for you so you can do this.

➕ show 3 replies

alt Hacker News

Vm.overcommit_memory=2 is the right setting for servers

Comments