gperftools/doc/heap_profiler.html

326 lines
10 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
<title>Google Heap Profiler</title>
</head>
<body>
<h1>Profiling heap usage</h1>
This document describes how to profile the heap usage of a C++
program. This facility can be useful for
<ul>
<li> Figuring out what is in the program heap at any given time
<li> Locating memory leaks
<li> Finding places that do a lot of allocation
</ul>
<h2>Linking in the Heap Profiler</h2>
<p>
You can profile any program that has the tcmalloc library linked
in. No recompilation is necessary to use the heap profiler.
</p>
<p>
It's safe to link in tcmalloc even if you don't expect to
heap-profiler your program. Your programs will not run any slower
as long as you don't use any of the heap-profiler features.
</p>
<p>
You can run the heap profiler on applications you didn't compile
yourself, by using LD_PRELOAD:
</p>
<pre>
$ LD_PRELOAD="/usr/lib/libtcmalloc.so" HEAPPROFILE=... <binary>
</pre>
<p>
We don't necessarily recommend this mode of usage.
</p>
<h2>Turning On Heap Profiling</h2>
<p>
Define the environment variable HEAPPROFILE to the filename to dump the
profile to. For instance, to profile /usr/local/netscape:
</p>
<pre>
$ HEAPPROFILE=/tmp/profile /usr/local/netscape # sh
% setenv HEAPPROFILE /tmp/profile; /usr/local/netscape # csh
</pre>
<p>Profiling also works correctly with sub-processes: each child
process gets its own profile with its own name (generated by combining
HEAPPROFILE with the child's process id).</p>
<p>For security reasons, heap profiling will not write to a file --
and it thus not usable -- for setuid programs.</p>
<h2>Extracting a profile</h2>
<p>
If heap-profiling is turned on in a program, the program will periodically
write profiles to the filesystem. The sequence of profiles will be named:
</p>
<pre>
&lt;prefix&gt;.0000.heap
&lt;prefix&gt;.0001.heap
&lt;prefix&gt;.0002.heap
...
</pre>
<p>
where <code>&lt;prefix&gt;</code> is the value supplied in
<code>HEAPPROFILE</code>. Note that if the supplied prefix
does not start with a <code>/</code>, the profile files will be
written to the program's working directory.
</p>
<p>
By default, a new profile file is written after every 1GB of
allocation. The profile-writing interval can be adjusted by calling
HeapProfilerSetAllocationInterval() from your program. This takes one
argument: a numeric value that indicates the number of bytes of allocation
between each profile dump.
</p>
<p>
You can also generate profiles from specific points in the program
by inserting a call to <code>HeapProfile()</code>. Example:
</p>
<pre>
extern const char* HeapProfile();
const char* profile = HeapProfile();
fputs(profile, stdout);
free(const_cast&lt;char*&gt;(profile));
</pre>
<h2>What is profiled</h2>
The profiling system instruments all allocations and frees. It keeps
track of various pieces of information per allocation site. An
allocation site is defined as the active stack trace at the call to
<code>malloc</code>, <code>calloc</code>, <code>realloc</code>, or,
<code>new</code>.
<h2>Interpreting the profile</h2>
The profile output can be viewed by passing it to the
<code>pprof</code> tool. The <code>pprof</code> tool can print both
CPU usage and heap usage information. It is documented in detail
on the <a href="cpu_profiler.html">CPU Profiling</a> page.
Heap-profile-specific flags and usage are explained below.
<p>
Here are some examples. These examples assume the binary is named
<code>gfs_master</code>, and a sequence of heap profile files can be
found in files named:
</p>
<pre>
profile.0001.heap
profile.0002.heap
...
profile.0100.heap
</pre>
<h3>Why is a process so big</h3>
<pre>
% pprof --gv gfs_master profile.0100.heap
</pre>
This command will pop-up a <code>gv</code> window that displays
the profile information as a directed graph. Here is a portion
of the resulting output:
<p>
<center>
<img src="heap-example1.png">
</center>
</p>
A few explanations:
<ul>
<li> <code>GFS_MasterChunk::AddServer</code> accounts for 255.6 MB
of the live memory, which is 25% of the total live memory.
<li> <code>GFS_MasterChunkTable::UpdateState</code> is directly
accountable for 176.2 MB of the live memory (i.e., it directly
allocated 176.2 MB that has not been freed yet). Furthermore,
it and its callees are responsible for 729.9 MB. The
labels on the outgoing edges give a good indication of the
amount allocated by each callee.
</ul>
<h3>Comparing Profiles</h3>
<p>
You often want to skip allocations during the initialization phase of
a program so you can find gradual memory leaks. One simple way to do
this is to compare two profiles -- both collected after the program
has been running for a while. Specify the name of the first profile
using the <code>--base</code> option. Example:
</p>
<pre>
% pprof --base=profile.0004.heap gfs_master profile.0100.heap
</pre>
<p>
The memory-usage in <code>profile.0004.heap</code> will be subtracted from
the memory-usage in <code>profile.0100.heap</code> and the result will
be displayed.
</p>
<h3>Text display</h3>
<pre>
% pprof gfs_master profile.0100.heap
255.6 24.7% 24.7% 255.6 24.7% GFS_MasterChunk::AddServer
184.6 17.8% 42.5% 298.8 28.8% GFS_MasterChunkTable::Create
176.2 17.0% 59.5% 729.9 70.5% GFS_MasterChunkTable::UpdateState
169.8 16.4% 75.9% 169.8 16.4% PendingClone::PendingClone
76.3 7.4% 83.3% 76.3 7.4% __default_alloc_template::_S_chunk_alloc
49.5 4.8% 88.0% 49.5 4.8% hashtable::resize
...
</pre>
<p>
<ul>
<li> The first column contains the direct memory use in MB.
<li> The fourth column contains memory use by the procedure
and all of its callees.
<li> The second and fifth columns are just percentage representations
of the numbers in the first and fifth columns.
<li> The third column is a cumulative sum of the second column
(i.e., the <code>k</code>th entry in the third column is the
sum of the first <code>k</code> entries in the second column.)
</ul>
<h3>Ignoring or focusing on specific regions</h3>
The following command will give a graphical display of a subset of
the call-graph. Only paths in the call-graph that match the
regular expression <code>DataBuffer</code> are included:
<pre>
% pprof --gv --focus=DataBuffer gfs_master profile.0100.heap
</pre>
Similarly, the following command will omit all paths subset of the
call-graph. All paths in the call-graph that match the regular
expression <code>DataBuffer</code> are discarded:
<pre>
% pprof --gv --ignore=DataBuffer gfs_master profile.0100.heap
</pre>
<h3>Total allocations + object-level information</h3>
<P>
All of the previous examples have displayed the amount of in-use
space. I.e., the number of bytes that have been allocated but not
freed. You can also get other types of information by supplying
a flag to <code>pprof</code>:
</p>
<center>
<table frame=box rules=sides cellpadding=5 width=100%>
<tr valign=top>
<td><code>--inuse_space</code></td>
<td>
Display the number of in-use megabytes (i.e. space that has
been allocated but not freed). This is the default.
</td>
</tr>
<tr valign=top>
<td><code>--inuse_objects</code></td>
<td>
Display the number of in-use objects (i.e. number of
objects that have been allocated but not freed).
</td>
</tr>
<tr valign=top>
<td><code>--alloc_space</code></td>
<td>
Display the number of allocated megabytes. This includes
the space that has since been de-allocated. Use this
if you want to find the main allocation sites in the
program.
</td>
</tr>
<tr valign=top>
<td><code>--alloc_objects</code></td>
<td>
Display the number of allocated objects. This includes
the objects that have since been de-allocated. Use this
if you want to find the main allocation sites in the
program.
</td>
</table>
</center>
<h2>Caveats</h2>
<ul>
<li> <p>
Heap profiling requires the use of libtcmalloc. This requirement
may be removed in a future version of the heap profiler, and the
heap profiler separated out into its own library.
</p>
<li> <p>
If the program linked in a library that was not compiled
with enough symbolic information, all samples associated
with the library may be charged to the last symbol found
in the program before the libary. This will artificially
inflate the count for that symbol.
</p>
<li> <p>
If you run the program on one machine, and profile it on another,
and the shared libraries are different on the two machines, the
profiling output may be confusing: samples that fall within
the shared libaries may be assigned to arbitrary procedures.
</p>
<li> <p>
Several libraries, such as some STL implementations, do their own
memory management. This may cause strange profiling results. We
have code in libtcmalloc to cause STL to use tcmalloc for memory
management (which in our tests is better than STL's internal
management), though it only works for some STL implementations.
</p>
<li> <p>
If your program forks, the children will also be profiled (since
they inherit the same HEAPPROFILE setting). Each process is
profiled separately; to distinguish the child profiles from the
parent profile and from each other, all children will have their
process-id attached to the HEAPPROFILE name.
</p>
<li> <p>
Due to a hack we make to work around a possible gcc bug, your
profiles may end up named strangely if the first character of
your HEAPPROFILE variable has ascii value greater than 127. This
should be exceedingly rare, but if you need to use such a name,
just set prepend <code>./</code> to your filename:
<code>HEAPPROFILE=./&Auml;gypten</code>.
</p>
</ul>
<hr>
<address><a href="mailto:opensource@google.com">Sanjay Ghemawat</a></address>
<!-- Created: Tue Dec 19 10:43:14 PST 2000 -->
<!-- hhmts start -->
Last modified: Wed Apr 20 05:46:16 PDT 2005
<!-- hhmts end -->
</body>
</html>