mirror of
https://github.com/gperftools/gperftools
synced 2024-12-21 06:50:03 +00:00
74ad5d57ec
* google-perftools: version 0.90 release * (As the version-number jump hints, this is a major new release: almost every piece of functionality was rewritten. I can't do justice to all the changes, but will concentrate on highlights.) *** USER-VISIBLE CHANGES: * Ability to "release" unused memory added to tcmalloc * Exposed more tweaking knobs via environment variables (see docs) * pprof tries harder to map addresses to functions * tcmalloc_minimal compiles and runs on FreeBSD 6.0 and Solaris 10 *** INTERNAL CHANGES: * Much better 64-bit support * Better multiple-processor support (e.g. multicore contention tweaks) * Support for recent kernel ABI changes (e.g. new arg to mremap) * Addition of spinlocks to tcmalloc to reduce contention cost * Speed up tcmalloc by using __thread on systems that support TLS * Total redesign of heap-checker to improve liveness checking * More portable stack-frame analysis -- no more hard-coded constants! * Disentangled heap-profiler code and heap-checker code * Several new unittests to test, e.g., thread-contention costs * Lots of small (but important!) bug fixes: e.g., fixing GetPC on amd64 *** KNOWN PROBLEMS: * CPU-profiling may crash on x86_64 (64-bit) systems. See the README * Profiling/heap-checking may deadlock on x86_64 systems. See README git-svn-id: http://gperftools.googlecode.com/svn/trunk@28 6b5cf1ce-ec42-a296-1ba9-69fdba395a50
477 lines
16 KiB
HTML
477 lines
16 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
|
|
<HTML>
|
|
|
|
<HEAD>
|
|
<link rel="stylesheet" href="designstyle.css">
|
|
<title>Google CPU Profiler</title>
|
|
</HEAD>
|
|
|
|
<BODY>
|
|
|
|
<p align=right>
|
|
<i>Last modified
|
|
<script type=text/javascript>
|
|
var lm = new Date(document.lastModified);
|
|
document.write(lm.toDateString());
|
|
</script></i>
|
|
</p>
|
|
|
|
<p>This is the CPU profiler we use at Google. There are three parts
|
|
to using it: linking the library into an application, running the
|
|
code, and analyzing the output.</p>
|
|
|
|
|
|
<H1>Linking in the Library</H1>
|
|
|
|
<p>To install the CPU profiler into your executable, add
|
|
<code>-lprofiler</code> to the link-time step for your executable.
|
|
(It's also probably possible to add in the profiler at run-time using
|
|
<code>LD_PRELOAD</code>, but this isn't necessarily recommended.)</p>
|
|
|
|
<p>This does <i>not</i> turn on CPU profiling; it just inserts the
|
|
code. For that reason, it's practical to just always link
|
|
<code>-lprofiler</code> into a binary while developing; that's what we
|
|
do at Google. (However, since any user can turn on the profiler by
|
|
setting an environment variable, it's not necessarily recommended to
|
|
install profiler-linked binaries into a production, running
|
|
system.)</p>
|
|
|
|
|
|
<H1>Running the Code</H1>
|
|
|
|
<p>There are several alternatives to actually turn on CPU profiling
|
|
for a given run of an executable:</p>
|
|
|
|
<ol>
|
|
<li> <p>Define the environment variable CPUPROFILE to the filename
|
|
to dump the profile to. For instance, to profile
|
|
<code>/usr/local/bin/my_binary_compiled_with_tcmalloc</code>:</p>
|
|
<pre>% env CPUPROFILE=/tmp/mybin.prof /usr/local/bin/my_binary_compiled_with_tcmalloc</pre>
|
|
|
|
<li> <p>In your code, bracket the code you want profiled in calls to
|
|
<code>ProfilerStart()</code> and <code>ProfilerStop()</code>.
|
|
<code>ProfilerStart()</code> will take
|
|
the profile-filename as an argument.</p>
|
|
</ol>
|
|
|
|
<p>In Linux 2.6 and above, profiling works correctly with threads,
|
|
automatically profiling all threads. In Linux 2.4, profiling only
|
|
profiles the main thread (due to a kernel bug involving itimers and
|
|
threads). Profiling works correctly with sub-processes: each child
|
|
process gets its own profile with its own name (generated by combining
|
|
CPUPROFILE with the child's process id).</p>
|
|
|
|
<p>For security reasons, CPU profiling will not write to a file -- and
|
|
is thus not usable -- for setuid programs.</p>
|
|
|
|
<H2>Modifying Runtime Behavior</H2>
|
|
|
|
<p>You can more finely control the behavior of the CPU profiler via
|
|
environment variables.</p>
|
|
|
|
<table frame=box rules=sides cellpadding=5 width=100%>
|
|
|
|
<tr valign=top>
|
|
<td><code>CPUPROFILE_FREQUENCY=<i>x</i></code></td>
|
|
<td>
|
|
How many interrupts/second the cpu-profiler samples.
|
|
</td>
|
|
</tr>
|
|
|
|
</table>
|
|
|
|
|
|
<h1><a name="pprof">Analyzing the Output</a></h1>
|
|
|
|
<p><code>pprof</code> is the script used to analyze a profile. It has
|
|
many output modes, both textual and graphical. Some give just raw
|
|
numbers, much like the <code>-pg</code> output of <code>gcc</code>,
|
|
and others show the data in the form of a dependency graph.</p>
|
|
|
|
<p>pprof <b>requires</b> <code>perl5</code> to be installed to run.
|
|
It also requires <code>dot</code> to be installed for any of the
|
|
graphical output routines, and <code>gv</code> to be installed for
|
|
<code>--gv</code> mode (described below).
|
|
</p>
|
|
|
|
<p>Here are some ways to call pprof. These are described in more
|
|
detail below.</p>
|
|
|
|
<pre>
|
|
% pprof /bin/ls ls.prof
|
|
Enters "interactive" mode
|
|
% pprof --text /bin/ls ls.prof
|
|
Outputs one line per procedure
|
|
% pprof --gv /bin/ls ls.prof
|
|
Displays annotated call-graph via 'gv'
|
|
% pprof --gv --focus=Mutex /bin/ls ls.prof
|
|
Restricts to code paths including a .*Mutex.* entry
|
|
% pprof --gv --focus=Mutex --ignore=string /bin/ls ls.prof
|
|
Code paths including Mutex but not string
|
|
% pprof --list=getdir /bin/ls ls.prof
|
|
(Per-line) annotated source listing for getdir()
|
|
% pprof --disasm=getdir /bin/ls ls.prof
|
|
(Per-PC) annotated disassembly for getdir()
|
|
% pprof --text localhost:1234
|
|
Outputs one line per procedure for localhost:1234
|
|
</pre>
|
|
|
|
|
|
<h3>Analyzing Text Output</h3>
|
|
|
|
<p>Text mode has lines of output that look like this:</p>
|
|
<pre>
|
|
14 2.1% 17.2% 58 8.7% std::_Rb_tree::find
|
|
</pre>
|
|
|
|
<p>Here is how to interpret the columns:</p>
|
|
<ol>
|
|
<li> Number of profiling samples in this function
|
|
<li> Percentage of profiling samples in this function
|
|
<li> Percentage of profiling samples in the functions printed so far
|
|
<li> Number of profiling samples in this function and its callees
|
|
<li> Percentage of profiling samples in this function and its callees
|
|
<li> Function name
|
|
</ol>
|
|
|
|
|
|
<h3>Node Information</h3>
|
|
|
|
<p>In the various graphical modes of pprof, the output is a call graph
|
|
annotated with timing information, like so:</p>
|
|
|
|
<A HREF="pprof-test-big.gif">
|
|
<center><table><tr><td>
|
|
<img src="pprof-test.gif">
|
|
</td></tr></table></center>
|
|
</A>
|
|
|
|
<p>Each node represents a procedure. The directed edges indicate
|
|
caller to callee relations. Each node is formatted as follows:</p>
|
|
|
|
<center><pre>
|
|
Class Name
|
|
Method Name
|
|
local (percentage)
|
|
<b>of</b> cumulative (percentage)
|
|
</pre></center>
|
|
|
|
<p>The last one or two lines contains the timing information. (The
|
|
profiling is done via a sampling method, where by default we take 100
|
|
samples a second. Therefor one unit of time in the output corresponds
|
|
to about 10 milliseconds of execution time.) The "local" time is the
|
|
time spent executing the instructions directly contained in the
|
|
procedure (and in any other procedures that were inlined into the
|
|
procedure). The "cumulative" time is the sum of the "local" time and
|
|
the time spent in any callees. If the cumulative time is the same as
|
|
the local time, it is not printed.</p>
|
|
|
|
<p>For instance, the timing information for test_main_thread()
|
|
indicates that 155 units (about 1.55 seconds) were spent executing the
|
|
code in <code>test_main_thread()</code> and 200 units were spent while
|
|
executing <code>test_main_thread()</code> and its callees such as
|
|
<code>snprintf()</code>.</p>
|
|
|
|
<p>The size of the node is proportional to the local count. The
|
|
percentage displayed in the node corresponds to the count divided by
|
|
the total run time of the program (that is, the cumulative count for
|
|
<code>main()</code>).</p>
|
|
|
|
<h3>Edge Information</h3>
|
|
|
|
<p>An edge from one node to another indicates a caller to callee
|
|
relationship. Each edge is labelled with the time spent by the callee
|
|
on behalf of the caller. E.g, the edge from
|
|
<code>test_main_thread()</code> to <code>snprintf()</code> indicates
|
|
that of the 200 samples in <code>test_main_thread()</code>, 37 are
|
|
because of calls to <code>snprintf()</code>.</p>
|
|
|
|
<p>Note that <code>test_main_thread()</code> has an edge to
|
|
<code>vsnprintf()</code>, even though <code>test_main_thread()</code>
|
|
doesn't call that function directly. This is because the code was
|
|
compiled with <code>-O2</code>; the profile reflects the optimized
|
|
control flow.</p>
|
|
|
|
<h3>Meta Information</h3>
|
|
|
|
<p>The top of the display should contain some meta information
|
|
like:</p>
|
|
<pre>
|
|
/tmp/profiler2_unittest
|
|
Total samples: 202
|
|
Focusing on: 202
|
|
Dropped nodes with <= 1 abs(samples)
|
|
Dropped edges with <= 0 samples
|
|
</pre>
|
|
|
|
<p>This section contains the name of the program, and the total
|
|
samples collected during the profiling run. If the
|
|
<code>--focus</code> option is on (see the <a href="#focus">Focus</a>
|
|
section below), the legend also contains the number of samples being
|
|
shown in the focused display. Furthermore, some unimportant nodes and
|
|
edges are dropped to reduce clutter. The characteristics of the
|
|
dropped nodes and edges are also displayed in the legend.</p>
|
|
|
|
<h3><a name=focus>Focus and Ignore</a></h3>
|
|
|
|
<p>You can ask pprof to generate a display focused on a particular
|
|
piece of the program. You specify a regular expression. Any portion
|
|
of the call-graph that is on a path which contains at least one node
|
|
matching the regular expression is preserved. The rest of the
|
|
call-graph is dropped on the floor. For example, you can focus on the
|
|
<code>vsnprintf()</code> libc call in <code>profiler2_unittest</code>
|
|
as follows:</p>
|
|
|
|
<pre>
|
|
% pprof --gv --focus=vsnprintf /tmp/profiler2_unittest test.prof
|
|
</pre>
|
|
<A HREF="pprof-vsnprintf-big.gif">
|
|
<center><table><tr><td>
|
|
<img src="pprof-vsnprintf.gif">
|
|
</td></tr></table></center>
|
|
</A>
|
|
|
|
<p>Similarly, you can supply the <code>--ignore</code> option to
|
|
ignore samples that match a specified regular expression. E.g., if
|
|
you are interested in everything except calls to
|
|
<code>snprintf()</code>, you can say:</p>
|
|
<pre>
|
|
% pprof --gv --ignore=snprintf /tmp/profiler2_unittest test.prof
|
|
</pre>
|
|
|
|
|
|
<h3>Interactive mode</a></h3>
|
|
|
|
<p>By default -- if you don't specify any flags to the contrary --
|
|
pprof runs in interactive mode. At the <code>(pprof)</code> prompt,
|
|
you can run many of the commands described above. You can type
|
|
<code>help</code> for a list of what commands are available in
|
|
interactive mode.</p>
|
|
|
|
<h3><a name=options>pprof Options</a></h3>
|
|
|
|
For a complete list of pprof options, you can run <code>pprof
|
|
--help</code>.
|
|
|
|
<h4>Output Type</h4>
|
|
|
|
<p>
|
|
<center>
|
|
<table frame=box rules=sides cellpadding=5 width=100%>
|
|
<tr valign=top>
|
|
<td><code>--text</code></td>
|
|
<td>
|
|
Produces a textual listing. (Note: If you have an X display, and
|
|
<code>dot</code> and <code>gv</code> installed, you will probably
|
|
be happier with the <code>--gv</code> output.)
|
|
</td>
|
|
</tr>
|
|
<tr valign=top>
|
|
<td><code>--gv</code></td>
|
|
<td>
|
|
Generates annotated call-graph, converts to postscript, and
|
|
displays via gv (requres <code>dot</code> and <code>gv</code> be
|
|
installed).
|
|
</td>
|
|
</tr>
|
|
<tr valign=top>
|
|
<td><code>--dot</code></td>
|
|
<td>
|
|
Generates the annotated call-graph in dot format and
|
|
emits to stdout (requres <code>dot</code> be installed).
|
|
</td>
|
|
</tr>
|
|
<tr valign=top>
|
|
<td><code>--ps</code></td>
|
|
<td>
|
|
Generates the annotated call-graph in Postscript format and
|
|
emits to stdout (requres <code>dot</code> be installed).
|
|
</td>
|
|
</tr>
|
|
<tr valign=top>
|
|
<td><code>--pdf</code></td>
|
|
<td>
|
|
Generates the annotated call-graph in PDF format and emits to
|
|
stdout (requires <code>dot</code> and <code>ps2pdf</code> be
|
|
installed).
|
|
</td>
|
|
</tr>
|
|
<tr valign=top>
|
|
<td><code>--gif</code></td>
|
|
<td>
|
|
Generates the annotated call-graph in GIF format and
|
|
emits to stdout (requres <code>dot</code> be installed).
|
|
</td>
|
|
</tr>
|
|
<tr valign=top>
|
|
<td><code>--list=<<i>regexp</i>></code></td>
|
|
<td>
|
|
<p>Outputs source-code listing of routines whose
|
|
name matches <regexp>. Each line
|
|
in the listing is annotated with flat and cumulative
|
|
sample counts.</p>
|
|
|
|
<p>In the presence of inlined calls, the samples
|
|
associated with inlined code tend to get assigned
|
|
to a line that follows the location of the
|
|
inlined call. A more precise accounting can be
|
|
obtained by disassembling the routine using the
|
|
--disasm flag.</p>
|
|
</td>
|
|
</tr>
|
|
<tr valign=top>
|
|
<td><code>--disasm=<<i>regexp</i>></code></td>
|
|
<td>
|
|
Generates disassembly of routines that match
|
|
<regexp>, annotated with flat and
|
|
cumulative sample counts and emits to stdout.
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
</center>
|
|
|
|
<h4>Reporting Granularity</h4>
|
|
|
|
<p>By default, pprof produces one entry per procedure. However you can
|
|
use one of the following options to change the granularity of the
|
|
output. The <code>--files</code> option seems to be particularly
|
|
useless, and may be removed eventually.</p>
|
|
|
|
<center>
|
|
<table frame=box rules=sides cellpadding=5 width=100%>
|
|
<tr valign=top>
|
|
<td><code>--addresses</code></td>
|
|
<td>
|
|
Produce one node per program address.
|
|
</td>
|
|
</tr>
|
|
<td><code>--lines</code></td>
|
|
<td>
|
|
Produce one node per source line.
|
|
</td>
|
|
</tr>
|
|
<td><code>--functions</code></td>
|
|
<td>
|
|
Produce one node per function (this is the default).
|
|
</td>
|
|
</tr>
|
|
<td><code>--files</code></td>
|
|
<td>
|
|
Produce one node per source file.
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
</center>
|
|
|
|
<h4>Controlling the Call Graph Display</h4>
|
|
|
|
<p>Some nodes and edges are dropped to reduce clutter in the output
|
|
display. The following options control this effect:</p>
|
|
|
|
<center>
|
|
<table frame=box rules=sides cellpadding=5 width=100%>
|
|
<tr valign=top>
|
|
<td><code>--nodecount=<n></code></td>
|
|
<td>
|
|
This option controls the number of displayed nodes. The nodes
|
|
are first sorted by decreasing cumulative count, and then only
|
|
the top N nodes are kept. The default value is 80.
|
|
</td>
|
|
</tr>
|
|
<tr valign=top>
|
|
<td><code>--nodefraction=<f></code></td>
|
|
<td>
|
|
This option provides another mechanism for discarding nodes
|
|
from the display. If the cumulative count for a node is
|
|
less than this option's value multiplied by the total count
|
|
for the profile, the node is dropped. The default value
|
|
is 0.005; i.e. nodes that account for less than
|
|
half a percent of the total time are dropped. A node
|
|
is dropped if either this condition is satisfied, or the
|
|
--nodecount condition is satisfied.
|
|
</td>
|
|
</tr>
|
|
<tr valign=top>
|
|
<td><code>--edgefraction=<f></code></td>
|
|
<td>
|
|
This option controls the number of displayed edges. First of all,
|
|
an edge is dropped if either its source or destination node is
|
|
dropped. Otherwise, the edge is dropped if the sample
|
|
count along the edge is less than this option's value multiplied
|
|
by the total count for the profile. The default value is
|
|
0.001; i.e., edges that account for less than
|
|
0.1% of the total time are dropped.
|
|
</td>
|
|
</tr>
|
|
<tr valign=top>
|
|
<td><code>--focus=<re></code></td>
|
|
<td>
|
|
This option controls what region of the graph is displayed
|
|
based on the regular expression supplied with the option.
|
|
For any path in the callgraph, we check all nodes in the path
|
|
against the supplied regular expression. If none of the nodes
|
|
match, the path is dropped from the output.
|
|
</td>
|
|
</tr>
|
|
<tr valign=top>
|
|
<td><code>--ignore=<re></code></td>
|
|
<td>
|
|
This option controls what region of the graph is displayed
|
|
based on the regular expression supplied with the option.
|
|
For any path in the callgraph, we check all nodes in the path
|
|
against the supplied regular expression. If any of the nodes
|
|
match, the path is dropped from the output.
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
</center>
|
|
|
|
<p>The dropped edges and nodes account for some count mismatches in
|
|
the display. For example, the cumulative count for
|
|
<code>snprintf()</code> in the first diagram above was 41. However
|
|
the local count (1) and the count along the outgoing edges (12+1+20+6)
|
|
add up to only 40.</p>
|
|
|
|
|
|
<h1>Caveats</h1>
|
|
|
|
<ul>
|
|
<li> If the program exits because of a signal, the generated profile
|
|
will be <font color=red>incomplete, and may perhaps be
|
|
completely empty</font>.
|
|
<li> The displayed graph may have disconnected regions because
|
|
of the edge-dropping heuristics described above.
|
|
<li> If the program linked in a library that was not compiled
|
|
with enough symbolic information, all samples associated
|
|
with the library may be charged to the last symbol found
|
|
in the program before the library. This will artificially
|
|
inflate the count for that symbol.
|
|
<li> If you run the program on one machine, and profile it on
|
|
another, and the shared libraries are different on the two
|
|
machines, the profiling output may be confusing: samples that
|
|
fall within shared libaries may be assigned to arbitrary
|
|
procedures.
|
|
<li> If your program forks, the children will also be profiled
|
|
(since they inherit the same CPUPROFILE setting). Each process
|
|
is profiled separately; to distinguish the child profiles from
|
|
the parent profile and from each other, all children will have
|
|
their process-id appended to the CPUPROFILE name.
|
|
<li> Due to a hack we make to work around a possible gcc bug, your
|
|
profiles may end up named strangely if the first character of
|
|
your CPUPROFILE variable has ascii value greater than 127.
|
|
This should be exceedingly rare, but if you need to use such a
|
|
name, just set prepend <code>./</code> to your filename:
|
|
<code>CPUPROFILE=./Ägypten</code>.
|
|
</ul>
|
|
|
|
|
|
<hr>
|
|
<address>Sanjay Ghemawat<br>
|
|
<!-- Created: Tue Dec 19 10:43:14 PST 2000 -->
|
|
<!-- hhmts start -->
|
|
Last modified: Sat Feb 24 13:11:38 PST 2007 (csilvers)
|
|
<!-- hhmts end -->
|
|
</address>
|
|
</BODY>
|
|
</HTML>
|