326 lines
10 KiB
HTML
326 lines
10 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
|
|
<html>
|
|
<head>
|
|
<title>Google Heap Profiler</title>
|
|
</head>
|
|
|
|
<body>
|
|
<h1>Profiling heap usage</h1>
|
|
|
|
This document describes how to profile the heap usage of a C++
|
|
program. This facility can be useful for
|
|
<ul>
|
|
<li> Figuring out what is in the program heap at any given time
|
|
<li> Locating memory leaks
|
|
<li> Finding places that do a lot of allocation
|
|
</ul>
|
|
|
|
<h2>Linking in the Heap Profiler</h2>
|
|
|
|
<p>
|
|
You can profile any program that has the tcmalloc library linked
|
|
in. No recompilation is necessary to use the heap profiler.
|
|
</p>
|
|
|
|
<p>
|
|
It's safe to link in tcmalloc even if you don't expect to
|
|
heap-profiler your program. Your programs will not run any slower
|
|
as long as you don't use any of the heap-profiler features.
|
|
</p>
|
|
|
|
<p>
|
|
You can run the heap profiler on applications you didn't compile
|
|
yourself, by using LD_PRELOAD:
|
|
</p>
|
|
<pre>
|
|
$ LD_PRELOAD="/usr/lib/libtcmalloc.so" HEAPPROFILE=... <binary>
|
|
</pre>
|
|
<p>
|
|
We don't necessarily recommend this mode of usage.
|
|
</p>
|
|
|
|
|
|
<h2>Turning On Heap Profiling</h2>
|
|
|
|
<p>
|
|
Define the environment variable HEAPPROFILE to the filename to dump the
|
|
profile to. For instance, to profile /usr/local/netscape:
|
|
</p>
|
|
<pre>
|
|
$ HEAPPROFILE=/tmp/profile /usr/local/netscape # sh
|
|
% setenv HEAPPROFILE /tmp/profile; /usr/local/netscape # csh
|
|
</pre>
|
|
|
|
<p>Profiling also works correctly with sub-processes: each child
|
|
process gets its own profile with its own name (generated by combining
|
|
HEAPPROFILE with the child's process id).</p>
|
|
|
|
<p>For security reasons, heap profiling will not write to a file --
|
|
and it thus not usable -- for setuid programs.</p>
|
|
|
|
|
|
|
|
<h2>Extracting a profile</h2>
|
|
|
|
<p>
|
|
If heap-profiling is turned on in a program, the program will periodically
|
|
write profiles to the filesystem. The sequence of profiles will be named:
|
|
</p>
|
|
<pre>
|
|
<prefix>.0000.heap
|
|
<prefix>.0001.heap
|
|
<prefix>.0002.heap
|
|
...
|
|
</pre>
|
|
<p>
|
|
where <code><prefix></code> is the value supplied in
|
|
<code>HEAPPROFILE</code>. Note that if the supplied prefix
|
|
does not start with a <code>/</code>, the profile files will be
|
|
written to the program's working directory.
|
|
</p>
|
|
|
|
<p>
|
|
By default, a new profile file is written after every 1GB of
|
|
allocation. The profile-writing interval can be adjusted by calling
|
|
HeapProfilerSetAllocationInterval() from your program. This takes one
|
|
argument: a numeric value that indicates the number of bytes of allocation
|
|
between each profile dump.
|
|
</p>
|
|
|
|
<p>
|
|
You can also generate profiles from specific points in the program
|
|
by inserting a call to <code>HeapProfile()</code>. Example:
|
|
</p>
|
|
<pre>
|
|
extern const char* HeapProfile();
|
|
const char* profile = HeapProfile();
|
|
fputs(profile, stdout);
|
|
free(const_cast<char*>(profile));
|
|
</pre>
|
|
|
|
<h2>What is profiled</h2>
|
|
|
|
The profiling system instruments all allocations and frees. It keeps
|
|
track of various pieces of information per allocation site. An
|
|
allocation site is defined as the active stack trace at the call to
|
|
<code>malloc</code>, <code>calloc</code>, <code>realloc</code>, or,
|
|
<code>new</code>.
|
|
|
|
<h2>Interpreting the profile</h2>
|
|
|
|
The profile output can be viewed by passing it to the
|
|
<code>pprof</code> tool. The <code>pprof</code> tool can print both
|
|
CPU usage and heap usage information. It is documented in detail
|
|
on the <a href="cpu_profiler.html">CPU Profiling</a> page.
|
|
Heap-profile-specific flags and usage are explained below.
|
|
|
|
<p>
|
|
Here are some examples. These examples assume the binary is named
|
|
<code>gfs_master</code>, and a sequence of heap profile files can be
|
|
found in files named:
|
|
</p>
|
|
<pre>
|
|
profile.0001.heap
|
|
profile.0002.heap
|
|
...
|
|
profile.0100.heap
|
|
</pre>
|
|
|
|
<h3>Why is a process so big</h3>
|
|
|
|
<pre>
|
|
% pprof --gv gfs_master profile.0100.heap
|
|
</pre>
|
|
|
|
This command will pop-up a <code>gv</code> window that displays
|
|
the profile information as a directed graph. Here is a portion
|
|
of the resulting output:
|
|
|
|
<p>
|
|
<center>
|
|
<img src="heap-example1.png">
|
|
</center>
|
|
</p>
|
|
|
|
A few explanations:
|
|
<ul>
|
|
<li> <code>GFS_MasterChunk::AddServer</code> accounts for 255.6 MB
|
|
of the live memory, which is 25% of the total live memory.
|
|
<li> <code>GFS_MasterChunkTable::UpdateState</code> is directly
|
|
accountable for 176.2 MB of the live memory (i.e., it directly
|
|
allocated 176.2 MB that has not been freed yet). Furthermore,
|
|
it and its callees are responsible for 729.9 MB. The
|
|
labels on the outgoing edges give a good indication of the
|
|
amount allocated by each callee.
|
|
</ul>
|
|
|
|
<h3>Comparing Profiles</h3>
|
|
|
|
<p>
|
|
You often want to skip allocations during the initialization phase of
|
|
a program so you can find gradual memory leaks. One simple way to do
|
|
this is to compare two profiles -- both collected after the program
|
|
has been running for a while. Specify the name of the first profile
|
|
using the <code>--base</code> option. Example:
|
|
</p>
|
|
<pre>
|
|
% pprof --base=profile.0004.heap gfs_master profile.0100.heap
|
|
</pre>
|
|
|
|
<p>
|
|
The memory-usage in <code>profile.0004.heap</code> will be subtracted from
|
|
the memory-usage in <code>profile.0100.heap</code> and the result will
|
|
be displayed.
|
|
</p>
|
|
|
|
<h3>Text display</h3>
|
|
|
|
<pre>
|
|
% pprof gfs_master profile.0100.heap
|
|
255.6 24.7% 24.7% 255.6 24.7% GFS_MasterChunk::AddServer
|
|
184.6 17.8% 42.5% 298.8 28.8% GFS_MasterChunkTable::Create
|
|
176.2 17.0% 59.5% 729.9 70.5% GFS_MasterChunkTable::UpdateState
|
|
169.8 16.4% 75.9% 169.8 16.4% PendingClone::PendingClone
|
|
76.3 7.4% 83.3% 76.3 7.4% __default_alloc_template::_S_chunk_alloc
|
|
49.5 4.8% 88.0% 49.5 4.8% hashtable::resize
|
|
...
|
|
</pre>
|
|
|
|
<p>
|
|
<ul>
|
|
<li> The first column contains the direct memory use in MB.
|
|
<li> The fourth column contains memory use by the procedure
|
|
and all of its callees.
|
|
<li> The second and fifth columns are just percentage representations
|
|
of the numbers in the first and fifth columns.
|
|
<li> The third column is a cumulative sum of the second column
|
|
(i.e., the <code>k</code>th entry in the third column is the
|
|
sum of the first <code>k</code> entries in the second column.)
|
|
</ul>
|
|
|
|
<h3>Ignoring or focusing on specific regions</h3>
|
|
|
|
The following command will give a graphical display of a subset of
|
|
the call-graph. Only paths in the call-graph that match the
|
|
regular expression <code>DataBuffer</code> are included:
|
|
<pre>
|
|
% pprof --gv --focus=DataBuffer gfs_master profile.0100.heap
|
|
</pre>
|
|
|
|
Similarly, the following command will omit all paths subset of the
|
|
call-graph. All paths in the call-graph that match the regular
|
|
expression <code>DataBuffer</code> are discarded:
|
|
<pre>
|
|
% pprof --gv --ignore=DataBuffer gfs_master profile.0100.heap
|
|
</pre>
|
|
|
|
<h3>Total allocations + object-level information</h3>
|
|
|
|
<P>
|
|
All of the previous examples have displayed the amount of in-use
|
|
space. I.e., the number of bytes that have been allocated but not
|
|
freed. You can also get other types of information by supplying
|
|
a flag to <code>pprof</code>:
|
|
</p>
|
|
|
|
<center>
|
|
<table frame=box rules=sides cellpadding=5 width=100%>
|
|
|
|
<tr valign=top>
|
|
<td><code>--inuse_space</code></td>
|
|
<td>
|
|
Display the number of in-use megabytes (i.e. space that has
|
|
been allocated but not freed). This is the default.
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign=top>
|
|
<td><code>--inuse_objects</code></td>
|
|
<td>
|
|
Display the number of in-use objects (i.e. number of
|
|
objects that have been allocated but not freed).
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign=top>
|
|
<td><code>--alloc_space</code></td>
|
|
<td>
|
|
Display the number of allocated megabytes. This includes
|
|
the space that has since been de-allocated. Use this
|
|
if you want to find the main allocation sites in the
|
|
program.
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign=top>
|
|
<td><code>--alloc_objects</code></td>
|
|
<td>
|
|
Display the number of allocated objects. This includes
|
|
the objects that have since been de-allocated. Use this
|
|
if you want to find the main allocation sites in the
|
|
program.
|
|
</td>
|
|
|
|
</table>
|
|
</center>
|
|
|
|
<h2>Caveats</h2>
|
|
|
|
<ul>
|
|
<li> <p>
|
|
Heap profiling requires the use of libtcmalloc. This requirement
|
|
may be removed in a future version of the heap profiler, and the
|
|
heap profiler separated out into its own library.
|
|
</p>
|
|
|
|
<li> <p>
|
|
If the program linked in a library that was not compiled
|
|
with enough symbolic information, all samples associated
|
|
with the library may be charged to the last symbol found
|
|
in the program before the libary. This will artificially
|
|
inflate the count for that symbol.
|
|
</p>
|
|
|
|
<li> <p>
|
|
If you run the program on one machine, and profile it on another,
|
|
and the shared libraries are different on the two machines, the
|
|
profiling output may be confusing: samples that fall within
|
|
the shared libaries may be assigned to arbitrary procedures.
|
|
</p>
|
|
|
|
<li> <p>
|
|
Several libraries, such as some STL implementations, do their own
|
|
memory management. This may cause strange profiling results. We
|
|
have code in libtcmalloc to cause STL to use tcmalloc for memory
|
|
management (which in our tests is better than STL's internal
|
|
management), though it only works for some STL implementations.
|
|
</p>
|
|
|
|
<li> <p>
|
|
If your program forks, the children will also be profiled (since
|
|
they inherit the same HEAPPROFILE setting). Each process is
|
|
profiled separately; to distinguish the child profiles from the
|
|
parent profile and from each other, all children will have their
|
|
process-id attached to the HEAPPROFILE name.
|
|
</p>
|
|
|
|
<li> <p>
|
|
Due to a hack we make to work around a possible gcc bug, your
|
|
profiles may end up named strangely if the first character of
|
|
your HEAPPROFILE variable has ascii value greater than 127. This
|
|
should be exceedingly rare, but if you need to use such a name,
|
|
just set prepend <code>./</code> to your filename:
|
|
<code>HEAPPROFILE=./Ägypten</code>.
|
|
</p>
|
|
|
|
</ul>
|
|
|
|
<hr>
|
|
<address><a href="mailto:opensource@google.com">Sanjay Ghemawat</a></address>
|
|
<!-- Created: Tue Dec 19 10:43:14 PST 2000 -->
|
|
<!-- hhmts start -->
|
|
Last modified: Wed Apr 20 05:46:16 PDT 2005
|
|
<!-- hhmts end -->
|
|
</body>
|
|
</html>
|