mirror of
https://github.com/gperftools/gperftools
synced 2024-12-24 16:22:06 +00:00
66737d1c25
git-svn-id: http://gperftools.googlecode.com/svn/trunk@3 6b5cf1ce-ec42-a296-1ba9-69fdba395a50
311 lines
10 KiB
HTML
311 lines
10 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
|
||
<html><head><title>Google Heap Profiler</title></head><body>
|
||
<h1>Profiling heap usage</h1>
|
||
|
||
This document describes how to profile the heap usage of a C++
|
||
program. This facility can be useful for
|
||
<ul>
|
||
<li> Figuring out what is in the program heap at any given time
|
||
</li><li> Locating memory leaks
|
||
</li><li> Finding places that do a lot of allocation
|
||
</li></ul>
|
||
|
||
<h2>Linking in the Heap Profiler</h2>
|
||
|
||
<p>
|
||
You can profile any program that has the tcmalloc library linked
|
||
in. No recompilation is necessary to use the heap profiler.
|
||
</p>
|
||
|
||
<p>
|
||
It's safe to link in tcmalloc even if you don't expect to
|
||
heap-profiler your program. Your programs will not run any slower
|
||
as long as you don't use any of the heap-profiler features.
|
||
</p>
|
||
|
||
<p>
|
||
You can run the heap profiler on applications you didn't compile
|
||
yourself, by using LD_PRELOAD:
|
||
</p>
|
||
<pre> $ LD_PRELOAD="/usr/lib/libtcmalloc.so" HEAPPROFILE=... <binary>
|
||
</binary></pre>
|
||
<p>
|
||
We don't necessarily recommend this mode of usage.
|
||
</p>
|
||
|
||
|
||
<h2>Turning On Heap Profiling</h2>
|
||
|
||
<p>
|
||
Define the environment variable HEAPPROFILE to the filename to dump the
|
||
profile to. For instance, to profile /usr/local/netscape:
|
||
</p>
|
||
<pre> $ HEAPPROFILE=/tmp/profile /usr/local/netscape # sh
|
||
% setenv HEAPPROFILE /tmp/profile; /usr/local/netscape # csh
|
||
</pre>
|
||
|
||
<p>Profiling also works correctly with sub-processes: each child
|
||
process gets its own profile with its own name (generated by combining
|
||
HEAPPROFILE with the child's process id).</p>
|
||
|
||
<p>For security reasons, heap profiling will not write to a file --
|
||
and it thus not usable -- for setuid programs.</p>
|
||
|
||
|
||
|
||
<h2>Extracting a profile</h2>
|
||
|
||
<p>
|
||
If heap-profiling is turned on in a program, the program will periodically
|
||
write profiles to the filesystem. The sequence of profiles will be named:
|
||
</p>
|
||
<pre> <prefix>.0000.heap
|
||
<prefix>.0001.heap
|
||
<prefix>.0002.heap
|
||
...
|
||
</pre>
|
||
<p>
|
||
where <code><prefix></code> is the value supplied in
|
||
<code>HEAPPROFILE</code>. Note that if the supplied prefix
|
||
does not start with a <code>/</code>, the profile files will be
|
||
written to the program's working directory.
|
||
</p>
|
||
|
||
<p>
|
||
By default, a new profile file is written after every 1GB of
|
||
allocation. The profile-writing interval can be adjusted by calling
|
||
HeapProfilerSetAllocationInterval() from your program. This takes one
|
||
argument: a numeric value that indicates the number of bytes of allocation
|
||
between each profile dump.
|
||
</p>
|
||
|
||
<p>
|
||
You can also generate profiles from specific points in the program
|
||
by inserting a call to <code>HeapProfile()</code>. Example:
|
||
</p>
|
||
<pre> extern const char* HeapProfile();
|
||
const char* profile = HeapProfile();
|
||
fputs(profile, stdout);
|
||
free(const_cast<char*>(profile));
|
||
</pre>
|
||
|
||
<h2>What is profiled</h2>
|
||
|
||
The profiling system instruments all allocations and frees. It keeps
|
||
track of various pieces of information per allocation site. An
|
||
allocation site is defined as the active stack trace at the call to
|
||
<code>malloc</code>, <code>calloc</code>, <code>realloc</code>, or,
|
||
<code>new</code>.
|
||
|
||
<h2>Interpreting the profile</h2>
|
||
|
||
The profile output can be viewed by passing it to the
|
||
<code>pprof</code> tool. The <code>pprof</code> tool can print both
|
||
CPU usage and heap usage information. It is documented in detail
|
||
on the <a href="http://goog-perftools.sourceforge.net/doc/cpu_profiler.html">CPU Profiling</a> page.
|
||
Heap-profile-specific flags and usage are explained below.
|
||
|
||
<p>
|
||
Here are some examples. These examples assume the binary is named
|
||
<code>gfs_master</code>, and a sequence of heap profile files can be
|
||
found in files named:
|
||
</p>
|
||
<pre> profile.0001.heap
|
||
profile.0002.heap
|
||
...
|
||
profile.0100.heap
|
||
</pre>
|
||
|
||
<h3>Why is a process so big</h3>
|
||
|
||
<pre> % pprof --gv gfs_master profile.0100.heap
|
||
</pre>
|
||
|
||
This command will pop-up a <code>gv</code> window that displays
|
||
the profile information as a directed graph. Here is a portion
|
||
of the resulting output:
|
||
|
||
<p>
|
||
</p><center>
|
||
<img src="../images/heap-example1.png">
|
||
</center>
|
||
<p></p>
|
||
|
||
A few explanations:
|
||
<ul>
|
||
<li> <code>GFS_MasterChunk::AddServer</code> accounts for 255.6 MB
|
||
of the live memory, which is 25% of the total live memory.
|
||
</li><li> <code>GFS_MasterChunkTable::UpdateState</code> is directly
|
||
accountable for 176.2 MB of the live memory (i.e., it directly
|
||
allocated 176.2 MB that has not been freed yet). Furthermore,
|
||
it and its callees are responsible for 729.9 MB. The
|
||
labels on the outgoing edges give a good indication of the
|
||
amount allocated by each callee.
|
||
</li></ul>
|
||
|
||
<h3>Comparing Profiles</h3>
|
||
|
||
<p>
|
||
You often want to skip allocations during the initialization phase of
|
||
a program so you can find gradual memory leaks. One simple way to do
|
||
this is to compare two profiles -- both collected after the program
|
||
has been running for a while. Specify the name of the first profile
|
||
using the <code>--base</code> option. Example:
|
||
</p>
|
||
<pre> % pprof --base=profile.0004.heap gfs_master profile.0100.heap
|
||
</pre>
|
||
|
||
<p>
|
||
The memory-usage in <code>profile.0004.heap</code> will be subtracted from
|
||
the memory-usage in <code>profile.0100.heap</code> and the result will
|
||
be displayed.
|
||
</p>
|
||
|
||
<h3>Text display</h3>
|
||
|
||
<pre>% pprof gfs_master profile.0100.heap
|
||
255.6 24.7% 24.7% 255.6 24.7% GFS_MasterChunk::AddServer
|
||
184.6 17.8% 42.5% 298.8 28.8% GFS_MasterChunkTable::Create
|
||
176.2 17.0% 59.5% 729.9 70.5% GFS_MasterChunkTable::UpdateState
|
||
169.8 16.4% 75.9% 169.8 16.4% PendingClone::PendingClone
|
||
76.3 7.4% 83.3% 76.3 7.4% __default_alloc_template::_S_chunk_alloc
|
||
49.5 4.8% 88.0% 49.5 4.8% hashtable::resize
|
||
...
|
||
</pre>
|
||
|
||
<p>
|
||
</p><ul>
|
||
<li> The first column contains the direct memory use in MB.
|
||
</li><li> The fourth column contains memory use by the procedure
|
||
and all of its callees.
|
||
</li><li> The second and fifth columns are just percentage representations
|
||
of the numbers in the first and fifth columns.
|
||
</li><li> The third column is a cumulative sum of the second column
|
||
(i.e., the <code>k</code>th entry in the third column is the
|
||
sum of the first <code>k</code> entries in the second column.)
|
||
</li></ul>
|
||
|
||
<h3>Ignoring or focusing on specific regions</h3>
|
||
|
||
The following command will give a graphical display of a subset of
|
||
the call-graph. Only paths in the call-graph that match the
|
||
regular expression <code>DataBuffer</code> are included:
|
||
<pre>% pprof --gv --focus=DataBuffer gfs_master profile.0100.heap
|
||
</pre>
|
||
|
||
Similarly, the following command will omit all paths subset of the
|
||
call-graph. All paths in the call-graph that match the regular
|
||
expression <code>DataBuffer</code> are discarded:
|
||
<pre>% pprof --gv --ignore=DataBuffer gfs_master profile.0100.heap
|
||
</pre>
|
||
|
||
<h3>Total allocations + object-level information</h3>
|
||
|
||
<p>
|
||
All of the previous examples have displayed the amount of in-use
|
||
space. I.e., the number of bytes that have been allocated but not
|
||
freed. You can also get other types of information by supplying
|
||
a flag to <code>pprof</code>:
|
||
</p>
|
||
|
||
<center>
|
||
<table cellpadding="5" frame="box" rules="sides" width="100%">
|
||
|
||
<tbody><tr valign="top">
|
||
<td><code>--inuse_space</code></td>
|
||
<td>
|
||
Display the number of in-use megabytes (i.e. space that has
|
||
been allocated but not freed). This is the default.
|
||
</td>
|
||
</tr>
|
||
|
||
<tr valign="top">
|
||
<td><code>--inuse_objects</code></td>
|
||
<td>
|
||
Display the number of in-use objects (i.e. number of
|
||
objects that have been allocated but not freed).
|
||
</td>
|
||
</tr>
|
||
|
||
<tr valign="top">
|
||
<td><code>--alloc_space</code></td>
|
||
<td>
|
||
Display the number of allocated megabytes. This includes
|
||
the space that has since been de-allocated. Use this
|
||
if you want to find the main allocation sites in the
|
||
program.
|
||
</td>
|
||
</tr>
|
||
|
||
<tr valign="top">
|
||
<td><code>--alloc_objects</code></td>
|
||
<td>
|
||
Display the number of allocated objects. This includes
|
||
the objects that have since been de-allocated. Use this
|
||
if you want to find the main allocation sites in the
|
||
program.
|
||
</td>
|
||
|
||
</tr></tbody></table>
|
||
</center>
|
||
|
||
<h2>Caveats</h2>
|
||
|
||
<ul>
|
||
<li> <p>
|
||
Heap profiling requires the use of libtcmalloc. This requirement
|
||
may be removed in a future version of the heap profiler, and the
|
||
heap profiler separated out into its own library.
|
||
</p>
|
||
|
||
</li><li> <p>
|
||
If the program linked in a library that was not compiled
|
||
with enough symbolic information, all samples associated
|
||
with the library may be charged to the last symbol found
|
||
in the program before the libary. This will artificially
|
||
inflate the count for that symbol.
|
||
</p>
|
||
|
||
</li><li> <p>
|
||
If you run the program on one machine, and profile it on another,
|
||
and the shared libraries are different on the two machines, the
|
||
profiling output may be confusing: samples that fall within
|
||
the shared libaries may be assigned to arbitrary procedures.
|
||
</p>
|
||
|
||
</li><li> <p>
|
||
Several libraries, such as some STL implementations, do their own
|
||
memory management. This may cause strange profiling results. We
|
||
have code in libtcmalloc to cause STL to use tcmalloc for memory
|
||
management (which in our tests is better than STL's internal
|
||
management), though it only works for some STL implementations.
|
||
</p>
|
||
|
||
</li><li> <p>
|
||
If your program forks, the children will also be profiled (since
|
||
they inherit the same HEAPPROFILE setting). Each process is
|
||
profiled separately; to distinguish the child profiles from the
|
||
parent profile and from each other, all children will have their
|
||
process-id attached to the HEAPPROFILE name.
|
||
</p>
|
||
|
||
</li><li> <p>
|
||
Due to a hack we make to work around a possible gcc bug, your
|
||
profiles may end up named strangely if the first character of
|
||
your HEAPPROFILE variable has ascii value greater than 127. This
|
||
should be exceedingly rare, but if you need to use such a name,
|
||
just set prepend <code>./</code> to your filename:
|
||
<code>HEAPPROFILE=./<2F>gypten</code>.
|
||
</p>
|
||
|
||
</li></ul>
|
||
|
||
<hr>
|
||
<address><a href="mailto:opensource@google.com">Sanjay Ghemawat</a></address>
|
||
<!-- Created: Tue Dec 19 10:43:14 PST 2000 -->
|
||
<!-- hhmts start -->
|
||
Last modified: Wed Apr 20 05:46:16 PDT 2005
|
||
<!-- hhmts end -->
|
||
|
||
</body></html>
|