What is Overcommit? And why is it bad?

4860阅读 0评论2014-04-10 godbach
分类:网络与安全

There are a lot of misunderstandings of memory management on Linux, leading to a lot of bad software that fails to robustly handle low-memory conditions. This all stems from a basic myth:

On Linux, malloc never fails. It always returns a pointer to allocated memory, but later your application might crash attempting to access that memory if not enough physical memory is available.

Where does this myth come from? Something called overcommit.

When a system such a Linux utilizing virtual memory allocates memory to a userspace process (via brk or mmap), there is no fixed correspondence between the virtual memory created in the process's virtual address space and the physical memory of the machine. In fact, before the memory is first used, it's likely that there's not any correspondence at all to physical memory; only after the first attempt to access the memory does the kernel have to setup a physical memory corresponding to it.

Overcommit refers to the practice of giving out virtual memory with no guarantee that physical storage for it exists. To make an analogy, it's like using a credit card and not keeping track of your purchases. A system performing overcommit just keeps giving out virtual memory until the debt collector comes calling — that is, until some program touches a previously-untouched page, and the kernel fails to find any physical memory to instantiate it — and then stuff starts crashing down.

What happens when “stuff starts crashing down”? It can vary, but the Linux approach was to design an elaborate heuristic “OOM killer” in the kernel that judges the behavior of each process and decides who’s “guilty” of making the machine run out of memory, then kills the guilty parties. In practice this works fairly well from a standpoint of avoiding killing critical system processes and killing the process that’s “hogging” memory, but the problem is that no process is really “guilty” of using more memory than was available, because everyone was (incorrectly) told that the memory was available.

Suppose you don’t want this kind of uncertainty/danger when it comes to memory allocation? The naive solution would be to immediately and statically allocate physical memory corresponding to all virtual memory. To extend the credit card analogy, this would be like using cash for all your purchases, or like using a debit card. You get the safety from overspending, but you also lose a lot of fluidity. Thankfully, there’s a better way to manage memory.

The approach taken in reality when you want to avoid committing too much memory is to account for all the memory that’s allocated. In our credit card analogy, this corresponds to using a credit card, but keeping track of all the purchases on it, and never purchasing more than you have funds to pay off. This turns out to be the Right Thing when it comes to managing virtual memory, and in fact it’s what Linux does when you set the vm.overcommit_memory sysctl parameter to the value 2. In this mode, all virtual memory that could potentially be modified (i.e. has read-write permissions) or lacks backing (i.e. an original copy on disk or other device that it could be restored from if it needs to be discarded) is accounted for as “commit charge”, the amount of memory the kernel as committed/promised to applications. When a new virtual memory allocation would cause the commit charge to exceed a configurable limit (by default, the size of swap plus half the size of physical ram), the allocation fails.

Unfortunately, a lot of application developers like overcommit, presumably for two reasons:

  1. It allows you to allocate a ridiculous amounts of memory as long as you know you’ll only make sparse use of it. In our credit card analogy, this is like a contractor going to a building supply store and buying twice the amount of materials they expect to need using a credit card, knowing they’ll be safe as long as they go back and return the unused materials for a refund before the credit card bill is due.
  2. It gives you an excuse to be lazy handling errors. You can rationalize ignoring the return value of malloc on the basis that, due to overcommit, even if you check the return value you can’t be sure to avoid crashing later when the kernel doesn’t have enough physical memory to instantiate your virtual memory.

The first reason is actually fairly legitimate, but overcommit is not the right solution. Instead, applications which want to use large amounts of virtual memory without getting charged for it should use the right protections to ensure that memory that won’t be written is mapped read-only. This is robust and portable, unlike relying on overcommit.

The second reason is pure laziness and foolishness. The fact that you can’t detect all errors on a system that’s configured in a non-robust way (the Linux default) is not an excuse for failing to detect other errors (like exhausting virtual address space) and crashing on systems that were intentionally configured for maximum robustness. In fact, even in the default configuration, Linux tries to avoid severe overcommit; it just doesn’t do detailed accounting.

Overcommit is harmful because it encourages, and provides a wrong but plausible argument for, writing bad software. While the number of applications that completely ignore the failure of malloc seems to be shrinking, plenty of applications and even libraries intended for use in serious software utilize “xmalloc” wrappers that abort (!!) the caller when malloc returns a null pointer, and the justification is almost always that, since the program could OOM-crash anyway if allocation fails, it’s no worse to abort. And of course this line of reasoning completely neglects systems that were intentionally configured to be robust under memory exhaustion.

Correct software, especially library code which may be used in applications requiring maximum reliability, should be written to assume the operating system does not overcommit, and to handle failure of malloc by backing out whatever operation was in place and reporting failure. This approach will ensure safety on systems configured to do commit accounting, and will give behavior no worse than the operating system’s default on systems that do overcommit.



Add:

overcommit_memory

Controls overcommit of system memory, possibly allowing processes to allocate (but not use) more memory than is actually available.

  • 0 - Heuristic overcommit handling. Obvious overcommits of address space are refused. Used for a typical system. It ensures a seriously wild allocation fails while allowing overcommit to reduce swap usage. root is allowed to allocate slighly more memory in this mode. This is the default.
  • 1 - Always overcommit. Appropriate for some scientific applications.
  • 2 - Don't overcommit. The total address space commit for the system is not permitted to exceed swap plus a configurable percentage (default is 50) of physical RAM. Depending on the percentage you use, in most situations this means a process will not be killed while attempting to use already-allocated memory but will receive errors on memory allocation as appropriate.

上一篇:kernel笔记——进程地址空间
下一篇:HAProxy 介绍及实践