Playing with Virtual Memory

Linux: the developer's personal gentleman

When you run a process, it needs some memory to store things: its heap, its stack, and any libraries it’s using. Linux provides and cleans up memory for your process like an extremely conscientious butler. You can (and generally should) just let Linux do its thing, but it’s a good idea to understand the basics of what’s going on.

One easy way (I think) to understand this stuff is to actually look at what’s going on using the pmap command. pmap shows you memory information for a given process.

For example, let’s take a really simple C program that prints its own process id (PID) and pauses:

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
 
int main() {
  printf("run `pmap %d`\n", getpid());
  pause();
}

Save this as mem_munch.c. Now compile and run it with:

$ gcc mem_munch.c -o mem_munch
$ ./mem_munch
run `pmap 25681`
&nbsp;

The PID you get will probably be different than mine (25681).

At this point, the program will “hang.” This is because of the pause() function, and it’s exactly what we want. Now we can look at the memory for this process at our leisure.

Open up a new shell and run pmap, replacing the PID below with the one mem_munch gave you:

$ pmap 25681
25681:   ./mem_munch
0000000000400000      4K r-x--  /home/user/mem_munch
0000000000600000      4K r----  /home/user/mem_munch
0000000000601000      4K rw---  /home/user/mem_munch
00007fcf5af88000   1576K r-x--  /lib/x86_64-linux-gnu/libc-2.13.so
00007fcf5b112000   2044K -----  /lib/x86_64-linux-gnu/libc-2.13.so
00007fcf5b311000     16K r----  /lib/x86_64-linux-gnu/libc-2.13.so
00007fcf5b315000      4K rw---  /lib/x86_64-linux-gnu/libc-2.13.so
00007fcf5b316000     24K rw---    [ anon ]
00007fcf5b31c000    132K r-x--  /lib/x86_64-linux-gnu/ld-2.13.so
00007fcf5b512000     12K rw---    [ anon ]
00007fcf5b539000     12K rw---    [ anon ]
00007fcf5b53c000      4K r----  /lib/x86_64-linux-gnu/ld-2.13.so
00007fcf5b53d000      8K rw---  /lib/x86_64-linux-gnu/ld-2.13.so
00007fff7efd8000    132K rw---    [ stack ]
00007fff7efff000      4K r-x--    [ anon ]
ffffffffff600000      4K r-x--    [ anon ]
 total             3984K

This output is how memory “looks” to the mem_munch process. If mem_munch asks the operating system for 00007fcf5af88000, it will get libc. If it asks for 00007fcf5b31c000, it will get the ld library.

This output is a bit dense and abstract, so let’s look at how some more familiar memory usage shows up. Change our program to put some memory on the stack and some on the heap, then pause.

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <stdlib.h>
 
int main() {
  int on_stack, *on_heap;
 
  // local variables are stored on the stack
  on_stack = 42;
  printf("stack address: %p\n", &on_stack);
 
  // malloc allocates heap memory
  on_heap = (int*)malloc(sizeof(int));
  printf("heap address: %p\n", on_heap);
 
  printf("run `pmap %d`\n", getpid());
  pause();
}

Now compile and run it:

$ ./mem_munch 
stack address: 0x7fff497670bc
heap address: 0x1b84010
run `pmap 11972`

Again, your exact numbers will probably be different than mine.

Before you kill mem_munch, run pmap on it:

$ pmap 11972
11972:   ./mem_munch
0000000000400000      4K r-x--  /home/user/mem_munch
0000000000600000      4K r----  /home/user/mem_munch
0000000000601000      4K rw---  /home/user/mem_munch
0000000001b84000    132K rw---    [ anon ]00007f3ec4d98000   1576K r-x--  /lib/x86_64-linux-gnu/libc-2.13.so
00007f3ec4f22000   2044K -----  /lib/x86_64-linux-gnu/libc-2.13.so
00007f3ec5121000     16K r----  /lib/x86_64-linux-gnu/libc-2.13.so
00007f3ec5125000      4K rw---  /lib/x86_64-linux-gnu/libc-2.13.so
00007f3ec5126000     24K rw---    [ anon ]
00007f3ec512c000    132K r-x--  /lib/x86_64-linux-gnu/ld-2.13.so
00007f3ec5322000     12K rw---    [ anon ]
00007f3ec5349000     12K rw---    [ anon ]
00007f3ec534c000      4K r----  /lib/x86_64-linux-gnu/ld-2.13.so
00007f3ec534d000      8K rw---  /lib/x86_64-linux-gnu/ld-2.13.so
00007fff49747000    132K rw---    [ stack ]
00007fff497bb000      4K r-x--    [ anon ]
ffffffffff600000      4K r-x--    [ anon ]
 total             4116K

Note that there’s a new entry between the final mem_munch section and libc-2.13.so. What could that be?


# from pmap
0000000001b84000 132K rw--- [ anon ]
# from our program
heap address: 0x1b84010

The addresses are almost the same. That block ([ anon ]) is the heap. (pmap labels blocks of memory that aren’t backed by a file [ anon ]. We’ll get into what being “backed by a file” means in a sec.)

The second thing to notice:


# from pmap
00007fff49747000 132K rw--- [ stack ]
# from our program
stack address: 0x7fff497670bc

And there’s your stack!

One other important thing to notice: this is how memory “looks” to your program, not how memory is actually laid out on your physical hardware. Look at how much memory mem_munch has to work with. According to pmap, mem_munch can address memory between address 0x0000000000400000 and 0xffffffffff600000 (well, actually 0x00007fffffffffffffff, beyond that is special). For those of you playing along at home, that’s almost 10 million terabytes of memory. That’s a lot of memory. (If your computer has that kind of memory, please leave your address and times you won’t be at home.)

So, the amount of memory the program can address is kind of ridiculous. Why does the computer do this? Well, lots of reasons, but one important one is that this means you can address more memory than you actually have on the machine and let the operating system take care of making sure the right stuff is in memory when you try to access it.

Memory Mapped Files

Memory mapping a file basically tells the operating system to load the file so the program can access it as an array of bytes. Then you can treat a file like an in-memory array.

For example, let’s make a (pretty stupid) random number generator ever by creating a file full of random numbers, then mmap-ing it and reading off random numbers.

First, we’ll create a big file called random (note that this creates a 1GB file, so make sure you have the disk space and be patient, it’ll take a little while to write):

$ dd if=/dev/urandom bs=1024 count=1000000 of=/home/user/random
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 123.293 s, 8.3 MB/s
$ ls -lh random
-rw-r--r-- 1 user user 977M 2011-08-29 16:46 random

Now we’ll mmap random and use it to generate random numbers.

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <stdlib.h>
#include <sys/mman.h>
 
int main() {
  char *random_bytes;
  FILE *f;
  int offset = 0;
 
  // open "random" for reading                                                                                                                                              
  f = fopen("/home/user/random", "r");
  if (!f) {
    perror("couldn't open file");
    return -1;
  }
 
  // we want to inspect memory before mapping the file                                                                                                                      
  printf("run `pmap %d`, then press <enter>", getpid());
  getchar();
 
  random_bytes = mmap(0, 1000000000, PROT_READ, MAP_SHARED, fileno(f), 0);
 
  if (random_bytes == MAP_FAILED) {
    perror("error mapping the file");
    return -1;
  }
 
  while (1) {
    printf("random number: %d (press <enter> for next number)", *(int*)(random_bytes+offset));
    getchar();
 
    offset += 4;
  }
}

If we run this program, we’ll get something like:

$ ./mem_munch 
run `pmap 12727`, then press <enter>

The program hasn’t done anything yet, so the output of running pmap will basically be the same as it was above (I’ll omit it for brevity). However, if we continue running mem_munch by pressing enter, our program will mmap random.

Now if we run pmap it will look something like:

$ pmap 12727
12727:   ./mem_munch
0000000000400000      4K r-x--  /home/user/mem_munch
0000000000600000      4K r----  /home/user/mem_munch
0000000000601000      4K rw---  /home/user/mem_munch
000000000147d000    132K rw---    [ anon ]
00007fe261c6f000 976564K r--s-  /home/user/random00007fe29d61c000   1576K r-x--  /lib/x86_64-linux-gnu/libc-2.13.so
00007fe29d7a6000   2044K -----  /lib/x86_64-linux-gnu/libc-2.13.so
00007fe29d9a5000     16K r----  /lib/x86_64-linux-gnu/libc-2.13.so
00007fe29d9a9000      4K rw---  /lib/x86_64-linux-gnu/libc-2.13.so
00007fe29d9aa000     24K rw---    [ anon ]
00007fe29d9b0000    132K r-x--  /lib/x86_64-linux-gnu/ld-2.13.so
00007fe29dba6000     12K rw---    [ anon ]
00007fe29dbcc000     16K rw---    [ anon ]
00007fe29dbd0000      4K r----  /lib/x86_64-linux-gnu/ld-2.13.so
00007fe29dbd1000      8K rw---  /lib/x86_64-linux-gnu/ld-2.13.so
00007ffff29b2000    132K rw---    [ stack ]
00007ffff29de000      4K r-x--    [ anon ]
ffffffffff600000      4K r-x--    [ anon ]
 total           980684K

This is very similar to before, but with an extra line (bolded), which kicks up virtual memory usage a bit (from 4MB to 980MB).

However, let’s re-run pmap with the -x option. This shows the resident set size (RSS): only 4KB of random are resident. Resident memory is memory that’s actually in RAM. There’s very little of random in RAM because we’ve only accessed the very start of the file, so the OS has only pulled the first bit of the file from disk into memory.

pmap -x 12727
12727:   ./mem_munch
Address           Kbytes     RSS   Dirty Mode   Mapping
0000000000400000       0       4       0 r-x--  mem_munch
0000000000600000       0       4       4 r----  mem_munch
0000000000601000       0       4       4 rw---  mem_munch
000000000147d000       0       4       4 rw---    [ anon ]
00007fe261c6f000       0       4       0 r--s-  random
00007fe29d61c000       0     288       0 r-x--  libc-2.13.so
00007fe29d7a6000       0       0       0 -----  libc-2.13.so
00007fe29d9a5000       0      16      16 r----  libc-2.13.so
00007fe29d9a9000       0       4       4 rw---  libc-2.13.so
00007fe29d9aa000       0      16      16 rw---    [ anon ]
00007fe29d9b0000       0     108       0 r-x--  ld-2.13.so
00007fe29dba6000       0      12      12 rw---    [ anon ]
00007fe29dbcc000       0      16      16 rw---    [ anon ]
00007fe29dbd0000       0       4       4 r----  ld-2.13.so
00007fe29dbd1000       0       8       8 rw---  ld-2.13.so
00007ffff29b2000       0      12      12 rw---    [ stack ]
00007ffff29de000       0       4       0 r-x--    [ anon ]
ffffffffff600000       0       0       0 r-x--    [ anon ]
----------------  ------  ------  ------
total kB          980684     508     100

If the virtual memory size (the Kbytes column) is all 0s for you, don’t worry about it. That’s a bug in Debian/Ubuntu’s -x option. The total is correct, it just doesn’t display correctly in the breakdown.

You can see that the resident set size, the amount that’s actually in memory, is tiny compared to the virtual memory. Your program can access any memory within a billion bytes of 0x00007fe261c6f000, but if it accesses anything past 4KB, it’ll probably have to go to disk for it*.

What if we modify our program so it reads the whole file/array of bytes?

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <stdlib.h>
#include <sys/mman.h>
 
int main() {
  char *random_bytes;
  FILE *f;
  int offset = 0;
 
  // open "random" for reading                                                                                                                                              
  f = fopen("/home/user/random", "r");
  if (!f) {
    perror("couldn't open file");
    return -1;
  }
 
  random_bytes = mmap(0, 1000000000, PROT_READ, MAP_SHARED, fileno(f), 0);
 
  if (random_bytes == MAP_FAILED) {
    printf("error mapping the file\n");
    return -1;
  }
 
  for (offset = 0; offset < 1000000000; offset += 4) {
    int i = *(int*)(random_bytes+offset);
 
    // to show we're making progress                                                                                                                                        
    if (offset % 1000000 == 0) {
      printf(".");
    }
  }
 
  // at the end, wait for signal so we can check mem                                                                                                                        
  printf("\ndone, run `pmap -x %d`\n", getpid());
  pause();
}

Now the resident set size is almost the same as the virtual memory size:

$ pmap -x 5378
5378:   ./mem_munch
Address           Kbytes     RSS   Dirty Mode   Mapping
0000000000400000       0       4       4 r-x--  mem_munch
0000000000600000       0       4       4 r----  mem_munch
0000000000601000       0       4       4 rw---  mem_munch
0000000002271000       0       4       4 rw---    [ anon ]
00007fc2aa333000       0  976564       0 r--s-  random
00007fc2e5ce0000       0     292       0 r-x--  libc-2.13.so
00007fc2e5e6a000       0       0       0 -----  libc-2.13.so
00007fc2e6069000       0      16      16 r----  libc-2.13.so
00007fc2e606d000       0       4       4 rw---  libc-2.13.so
00007fc2e606e000       0      16      16 rw---    [ anon ]
00007fc2e6074000       0     108       0 r-x--  ld-2.13.so
00007fc2e626a000       0      12      12 rw---    [ anon ]
00007fc2e6290000       0      16      16 rw---    [ anon ]
00007fc2e6294000       0       4       4 r----  ld-2.13.so
00007fc2e6295000       0       8       8 rw---  ld-2.13.so
00007fff037e6000       0      12      12 rw---    [ stack ]
00007fff039c9000       0       4       0 r-x--    [ anon ]
ffffffffff600000       0       0       0 r-x--    [ anon ]
----------------  ------  ------  ------
total kB          980684  977072     104

Now if we access any part of the file, it will be in RAM already. (Probably. Until something else kicks it out.) So, our program can access a gigabyte of memory, but the operating system can lazily load it into RAM as needed.

And that’s why your virtual memory is so damn high when you’re running MongoDB.

Left as an exercise to the reader: try running pmap on a mongod process before it’s done anything, once you’ve done a couple operations, and once it’s been running for a long time.

* This isn’t strictly true**. The kernel actually says, “If they want the first N bytes, they’re probably going to want some more of the file” so it’ll load, say, the first dozen KB of the file into memory but only tell the process about 4KB. When your program tries to access this memory that is in RAM, but it didn’t know was in RAM, it’s called a minor page fault (as opposed to a major page fault when it actually has to hit disk to load new info). back to context

** This note is also not strictly true. In fact, the whole file will probably be in memory before you map anything because you just wrote the thing with dd. So you’ll just be doing minor page faults as your program “discovers” it.

  • Pingback: MongoDB与内存 | 火丁笔记()

  • pmap is a cool little utility. I gotta go resurrect my FreeBSD box to see what the equivalent is.

  • pmap is a cool little utility. I gotta go resurrect my FreeBSD box to see what the equivalent is.

  • Anonymous

    FreeBSD should have pmap, probably the command line flags are different, though.

  • Anthony Burton

    pmap is available on FreeBSD, it’s in ports. It was interesting following along and using vmmap on OS X. Thanks for the article (and the mongo books; I have both :0) ).

  • Anonymous

    Good to know, I wasn’t sure if OS X had an equivalent.  Thanks!

  • Pingback: Linux虚拟内存实现原理 | 搜索引擎技术博客()

  • Liam

    Once of the best articles I’ve read on this topic… and I’ve read quite a few today 🙂 Do you know what lines in the pmap output which have the Mode as “—–” are? I’m comparing apache usage on a 64bit vs 32bit machine and it seems much higher on 64bit due to those entries (one per .so lib). I posted a detailed question on Stackoverflow, if you’re interested in specifics: http://stackoverflow.com/questions/9297334/why-do-i-see-big-differences-in-memory-usage-with-pmap-for-the-same-process-on-3

  • Anonymous

    Thank you!  

    As to the “—–“s: I am not sure why glibc does so, but it allocates “buffers” of virtual memory pointing to garbage.  I’m guessing it makes memory accounting neater?  It’ll give a program however much memory it asks for and leave the next block of it mapped to nowhere.  So, the —– sections are not actually taking up any actual memory, ever, but they increase the amount of virtual memory mapped.  You can pretty much ignore all memory allocated with —– permissions, unless you’re getting towards 16 terabytes of it 🙂

  • Liam

     I came across an article that confirms what you say above but has some good analysis of it: http://www.greenend.org.uk/rjk/2009/dataseg.html

    I’m amazed that something that makes the (already difficult) task of process memory analysis even more elusive on 64bit hasn’t been discussed more. 

  • Anonymous

    Thanks for the link, that article is really thorough and clear!

    It is weird how little-talked-of this stuff is.  I don’t understand why, but I have a lot of trouble finding good info on monitoring/ops in general on Google.

  • Ashish Saxena

    Is it possible to reverse engineer the output of pmap -x ? I have a java program that is using native memory and pmap shows many such anon blocks of about 1 MB each.  Is it possible to identify the source of these anon blocks ?

    Thanks,
    Ashish Saxena

kristina chodorow's blog