gperftools/doc/heapprofile.html

383 lines
13 KiB
HTML
Raw Normal View History

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<HTML>
<HEAD>
<link rel="stylesheet" href="designstyle.css">
<title>Gperftools Heap Profiler</title>
</HEAD>
<BODY>
<p align=right>
<i>Last modified
<script type=text/javascript>
var lm = new Date(document.lastModified);
document.write(lm.toDateString());
</script></i>
</p>
<p>This is the heap profiler we use at Google, to explore how C++
programs manage memory. This facility can be useful for</p>
<ul>
<li> Figuring out what is in the program heap at any given time
<li> Locating memory leaks
<li> Finding places that do a lot of allocation
</ul>
<p>The profiling system instruments all allocations and frees. It
keeps track of various pieces of information per allocation site. An
allocation site is defined as the active stack trace at the call to
<code>malloc</code>, <code>calloc</code>, <code>realloc</code>, or,
<code>new</code>.</p>
<p>There are three parts to using it: linking the library into an
application, running the code, and analyzing the output.</p>
<h1>Linking in the Library</h1>
<p>To install the heap profiler into your executable, add
<code>-ltcmalloc</code> to the link-time step for your executable.
Also, while we don't necessarily recommend this form of usage, it's
possible to add in the profiler at run-time using
<code>LD_PRELOAD</code>:
<pre>% env LD_PRELOAD="/usr/lib/libtcmalloc.so" &lt;binary&gt;</pre>
<p>This does <i>not</i> turn on heap profiling; it just inserts the
code. For that reason, it's practical to just always link
<code>-ltcmalloc</code> into a binary while developing; that's what we
do at Google. (However, since any user can turn on the profiler by
setting an environment variable, it's not necessarily recommended to
install profiler-linked binaries into a production, running
system.) Note that if you wish to use the heap profiler, you must
also use the tcmalloc memory-allocation library. There is no way
currently to use the heap profiler separate from tcmalloc.</p>
<h1>Running the Code</h1>
<p>There are several alternatives to actually turn on heap profiling
for a given run of an executable:</p>
<ol>
<li> <p>Define the environment variable HEAPPROFILE to the filename
to dump the profile to. For instance, to profile
<code>/usr/local/bin/my_binary_compiled_with_tcmalloc</code>:</p>
<pre>% env HEAPPROFILE=/tmp/mybin.hprof /usr/local/bin/my_binary_compiled_with_tcmalloc</pre>
<li> <p>In your code, bracket the code you want profiled in calls to
<code>HeapProfilerStart()</code> and <code>HeapProfilerStop()</code>.
(These functions are declared in <code>&lt;gperftools/heap-profiler.h&gt;</code>.)
<code>HeapProfilerStart()</code> will take the
profile-filename-prefix as an argument. Then, as often as
you'd like before calling <code>HeapProfilerStop()</code>, you
can use <code>HeapProfilerDump()</code> or
<code>GetHeapProfile()</code> to examine the profile. In case
it's useful, <code>IsHeapProfilerRunning()</code> will tell you
whether you've already called HeapProfilerStart() or not.</p>
</ol>
<p>For security reasons, heap profiling will not write to a file --
and is thus not usable -- for setuid programs.</p>
<H2>Modifying Runtime Behavior</H2>
<p>You can more finely control the behavior of the heap profiler via
environment variables.</p>
<table frame=box rules=sides cellpadding=5 width=100%>
<tr valign=top>
<td><code>HEAP_PROFILE_ALLOCATION_INTERVAL</code></td>
<td>default: 1073741824 (1 Gb)</td>
<td>
Dump heap profiling information each time the specified number of
bytes has been allocated by the program.
</td>
</tr>
<tr valign=top>
<td><code>HEAP_PROFILE_INUSE_INTERVAL</code></td>
<td>default: 104857600 (100 Mb)</td>
<td>
Dump heap profiling information whenever the high-water memory
usage mark increases by the specified number of bytes.
</td>
</tr>
<tr valign=top>
<td><code>HEAP_PROFILE_TIME_INTERVAL</code></td>
<td>default: 104857600 (100 Mb)</td>
<td>
Dump heap profiling information each time the specified
number of seconds has elapsed.
</td>
</tr>
<tr valign=top>
<td><code>HEAP_PROFILE_MMAP</code></td>
<td>default: false</td>
<td>
Profile <code>mmap</code>, <code>mremap</code> and <code>sbrk</code>
calls in addition
to <code>malloc</code>, <code>calloc</code>, <code>realloc</code>,
and <code>new</code>. <b>NOTE:</b> this causes the profiler to
profile calls internal to tcmalloc, since tcmalloc and friends use
mmap and sbrk internally for allocations. One partial solution is
to filter these allocations out when running <code>pprof</code>,
with something like
<code>pprof --ignore='DoAllocWithArena|SbrkSysAllocator::Alloc|MmapSysAllocator::Alloc</code>.
</td>
</tr>
<tr valign=top>
<td><code>HEAP_PROFILE_ONLY_MMAP</code></td>
<td>default: false</td>
<td>
Only profile <code>mmap</code>, <code>mremap</code>, and <code>sbrk</code>
calls; do not profile
<code>malloc</code>, <code>calloc</code>, <code>realloc</code>,
or <code>new</code>.
</td>
</tr>
<tr valign=top>
<td><code>HEAP_PROFILE_MMAP_LOG</code></td>
<td>default: false</td>
<td>
Log <code>mmap</code>/<code>munmap</code> calls.
</td>
</tr>
</table>
<H2>Checking for Leaks</H2>
<p>You can use the heap profiler to manually check for leaks, for
instance by reading the profiler output and looking for large
allocations. However, for that task, it's easier to use the <A
HREF="heap_checker.html">automatic heap-checking facility</A> built
into tcmalloc.</p>
<h1><a name="pprof">Analyzing the Output</a></h1>
<p>If heap-profiling is turned on in a program, the program will
periodically write profiles to the filesystem. The sequence of
profiles will be named:</p>
<pre>
&lt;prefix&gt;.0000.heap
&lt;prefix&gt;.0001.heap
&lt;prefix&gt;.0002.heap
...
</pre>
<p>where <code>&lt;prefix&gt;</code> is the filename-prefix supplied
when running the code (e.g. via the <code>HEAPPROFILE</code>
environment variable). Note that if the supplied prefix
does not start with a <code>/</code>, the profile files will be
written to the program's working directory.</p>
<p>The profile output can be viewed by passing it to the
<code>pprof</code> tool -- the same tool that's used to analyze <A
HREF="cpuprofile.html">CPU profiles</A>.
<p>Here are some examples. These examples assume the binary is named
<code>gfs_master</code>, and a sequence of heap profile files can be
found in files named:</p>
<pre>
/tmp/profile.0001.heap
/tmp/profile.0002.heap
...
/tmp/profile.0100.heap
</pre>
<h3>Why is a process so big</h3>
<pre>
% pprof --gv gfs_master /tmp/profile.0100.heap
</pre>
<p>This command will pop-up a <code>gv</code> window that displays
the profile information as a directed graph. Here is a portion
of the resulting output:</p>
<p><center>
<img src="heap-example1.png">
</center></p>
A few explanations:
<ul>
<li> <code>GFS_MasterChunk::AddServer</code> accounts for 255.6 MB
of the live memory, which is 25% of the total live memory.
<li> <code>GFS_MasterChunkTable::UpdateState</code> is directly
accountable for 176.2 MB of the live memory (i.e., it directly
allocated 176.2 MB that has not been freed yet). Furthermore,
it and its callees are responsible for 729.9 MB. The
labels on the outgoing edges give a good indication of the
amount allocated by each callee.
</ul>
<h3>Comparing Profiles</h3>
<p>You often want to skip allocations during the initialization phase
of a program so you can find gradual memory leaks. One simple way to
do this is to compare two profiles -- both collected after the program
has been running for a while. Specify the name of the first profile
using the <code>--base</code> option. For example:</p>
<pre>
% pprof --base=/tmp/profile.0004.heap gfs_master /tmp/profile.0100.heap
</pre>
<p>The memory-usage in <code>/tmp/profile.0004.heap</code> will be
subtracted from the memory-usage in
<code>/tmp/profile.0100.heap</code> and the result will be
displayed.</p>
<h3>Text display</h3>
<pre>
% pprof --text gfs_master /tmp/profile.0100.heap
255.6 24.7% 24.7% 255.6 24.7% GFS_MasterChunk::AddServer
184.6 17.8% 42.5% 298.8 28.8% GFS_MasterChunkTable::Create
176.2 17.0% 59.5% 729.9 70.5% GFS_MasterChunkTable::UpdateState
169.8 16.4% 75.9% 169.8 16.4% PendingClone::PendingClone
76.3 7.4% 83.3% 76.3 7.4% __default_alloc_template::_S_chunk_alloc
49.5 4.8% 88.0% 49.5 4.8% hashtable::resize
...
</pre>
<p>
<ul>
<li> The first column contains the direct memory use in MB.
<li> The fourth column contains memory use by the procedure
and all of its callees.
<li> The second and fifth columns are just percentage
representations of the numbers in the first and fourth columns.
<li> The third column is a cumulative sum of the second column
(i.e., the <code>k</code>th entry in the third column is the
sum of the first <code>k</code> entries in the second column.)
</ul>
<h3>Ignoring or focusing on specific regions</h3>
<p>The following command will give a graphical display of a subset of
the call-graph. Only paths in the call-graph that match the regular
expression <code>DataBuffer</code> are included:</p>
<pre>
% pprof --gv --focus=DataBuffer gfs_master /tmp/profile.0100.heap
</pre>
<p>Similarly, the following command will omit all paths subset of the
call-graph. All paths in the call-graph that match the regular
expression <code>DataBuffer</code> are discarded:</p>
<pre>
% pprof --gv --ignore=DataBuffer gfs_master /tmp/profile.0100.heap
</pre>
<h3>Total allocations + object-level information</h3>
<p>All of the previous examples have displayed the amount of in-use
space. I.e., the number of bytes that have been allocated but not
freed. You can also get other types of information by supplying a
flag to <code>pprof</code>:</p>
<center>
<table frame=box rules=sides cellpadding=5 width=100%>
<tr valign=top>
<td><code>--inuse_space</code></td>
<td>
Display the number of in-use megabytes (i.e. space that has
been allocated but not freed). This is the default.
</td>
</tr>
<tr valign=top>
<td><code>--inuse_objects</code></td>
<td>
Display the number of in-use objects (i.e. number of
objects that have been allocated but not freed).
</td>
</tr>
<tr valign=top>
<td><code>--alloc_space</code></td>
<td>
Display the number of allocated megabytes. This includes
the space that has since been de-allocated. Use this
if you want to find the main allocation sites in the
program.
</td>
</tr>
<tr valign=top>
<td><code>--alloc_objects</code></td>
<td>
Display the number of allocated objects. This includes
the objects that have since been de-allocated. Use this
if you want to find the main allocation sites in the
program.
</td>
</table>
</center>
<h3>Interactive mode</a></h3>
<p>By default -- if you don't specify any flags to the contrary --
pprof runs in interactive mode. At the <code>(pprof)</code> prompt,
you can run many of the commands described above. You can type
<code>help</code> for a list of what commands are available in
interactive mode.</p>
<h1>Caveats</h1>
<ul>
<li> Heap profiling requires the use of libtcmalloc. This
requirement may be removed in a future version of the heap
profiler, and the heap profiler separated out into its own
library.
<li> If the program linked in a library that was not compiled
with enough symbolic information, all samples associated
with the library may be charged to the last symbol found
in the program before the library. This will artificially
inflate the count for that symbol.
<li> If you run the program on one machine, and profile it on
another, and the shared libraries are different on the two
machines, the profiling output may be confusing: samples that
fall within the shared libaries may be assigned to arbitrary
procedures.
<li> Several libraries, such as some STL implementations, do their
own memory management. This may cause strange profiling
results. We have code in libtcmalloc to cause STL to use
tcmalloc for memory management (which in our tests is better
than STL's internal management), though it only works for some
STL implementations.
<li> If your program forks, the children will also be profiled
(since they inherit the same HEAPPROFILE setting). Each
process is profiled separately; to distinguish the child
profiles from the parent profile and from each other, all
children will have their process-id attached to the HEAPPROFILE
name.
<li> Due to a hack we make to work around a possible gcc bug, your
profiles may end up named strangely if the first character of
your HEAPPROFILE variable has ascii value greater than 127.
This should be exceedingly rare, but if you need to use such a
name, just set prepend <code>./</code> to your filename:
<code>HEAPPROFILE=./&Auml;gypten</code>.
</ul>
<hr>
Fri Jul 15 16:10:51 2011 Google Inc. <opensource@google.com> * google-perftools: version 1.8 release * PORTING: (Disabled) support for patching mmap on freebsd (chapp...) * PORTING: Support volatile __malloc_hook for glibc 2.14 (csilvers) * PORTING: Use _asm rdtsc and __rdtsc to get cycleclock in windows (koda) * PORTING: Fix fd vs. HANDLE compiler error on cygwin (csilvers) * PORTING: Do not test memalign or double-linking on OS X (csilvers) * PORTING: Actually enable TLS on windows (jontra) * PORTING: Some work to compile under Native Client (krasin) * PORTING: deal with pthread_once w/o -pthread on freebsd (csilvers) * Rearrange libc-overriding to make it easier to port (csilvers) * Display source locations in pprof disassembly (sanjay) * BUGFIX: Actually initialize allocator name (mec) * BUGFIX: Keep track of 'overhead' bytes in malloc reporting (csilvers) * Allow ignoring one object twice in the leak checker (glider) * BUGFIX: top10 in pprof should print 10 lines, not 11 (rsc) * Refactor vdso source files (tipp) * Some documentation cleanups * Document MAX_TOTAL_THREAD_CACHE_SIZE <= 1Gb (nsethi) * Add MallocExtension::GetOwnership(ptr) (csilvers) * BUGFIX: We were leaving out a needed $(top_srcdir) in the Makefile * PORTING: Support getting argv0 on OS X * Add 'weblist' command to pprof: like 'list' but html (sanjay) * Improve source listing in pprof (sanjay) * Cap cache sizes to reduce fragmentation (ruemmler) * Improve performance by capping or increasing sizes (ruemmler) * Add M{,un}mapReplacmenet hooks into MallocHook (ribrdb) * Refactored system allocator logic (gangren) * Include cleanups (csilvers) * Add TCMALLOC_SMALL_BUT_SLOW support (ruemmler) * Clarify that tcmalloc stats are MiB (robinson) * Remove support for non-tcmalloc debugallocation (blount) * Add a new test: malloc_hook_test (csilvers) * Change the configure script to be more crosstool-friendly (mcgrathr) * PORTING: leading-underscore changes to support win64 (csilvers) * Improve debugallocation tc_malloc_size (csilvers) * Extend atomicops.h and cyceclock to use ARM V6+ optimized code (sanek) * Change malloc-hook to use a list-like structure (llib) * Add flag to use MAP_PRIVATE in memfs_malloc (gangren) * Windows support for pprof: nul and /usr/bin/file (csilvers) * TESTING: add test on strdup to tcmalloc_test (csilvers) * Augment heap-checker to deal with no-inode maps (csilvers) * Count .dll/.dylib as shared libs in heap-checker (csilvers) * Disable sys_futex for arm; it's not always reliable (sanek) * PORTING: change lots of windows/port.h macros to functions * BUGFIX: Generate correct version# in tcmalloc.h on windows (csilvers) * PORTING: Some casting to make solaris happier about types (csilvers) * TESTING: Disable debugallocation_test in 'minimal' mode (csilvers) * Rewrite debugallocation to be more modular (csilvers) * Don't try to run the heap-checker under valgrind (ppluzhnikov) * BUGFIX: Make focused stat %'s relative, not absolute (sanjay) * BUGFIX: Don't use '//' comments in a C file (csilvers) * Quiet new-gcc compiler warnings via -Wno-unused-result, etc (csilvers) git-svn-id: http://gperftools.googlecode.com/svn/trunk@110 6b5cf1ce-ec42-a296-1ba9-69fdba395a50
2011-07-16 01:07:10 +00:00
<address>Sanjay Ghemawat
<!-- Created: Tue Dec 19 10:43:14 PST 2000 -->
</address>
</body>
</html>