2016-03-01 04:09:39 +00:00
|
|
|
|
gperftools
|
|
|
|
|
----------
|
|
|
|
|
(originally Google Performance Tools)
|
|
|
|
|
|
|
|
|
|
The fastest malloc we’ve seen; works particularly well with threads
|
|
|
|
|
and STL. Also: thread-friendly heap-checker, heap-profiler, and
|
|
|
|
|
cpu-profiler.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
OVERVIEW
|
|
|
|
|
---------
|
|
|
|
|
|
|
|
|
|
gperftools is a collection of a high-performance multi-threaded
|
|
|
|
|
malloc() implementation, plus some pretty nifty performance analysis
|
|
|
|
|
tools.
|
|
|
|
|
|
|
|
|
|
gperftools is distributed under the terms of the BSD License. Join our
|
2016-05-14 15:50:34 +00:00
|
|
|
|
mailing list at gperftools@googlegroups.com for updates:
|
|
|
|
|
https://groups.google.com/forum/#!forum/gperftools
|
2016-03-01 04:09:39 +00:00
|
|
|
|
|
|
|
|
|
gperftools was original home for pprof program. But do note that
|
2016-03-31 18:27:21 +00:00
|
|
|
|
original pprof (which is still included with gperftools) is now
|
2021-04-20 09:11:46 +00:00
|
|
|
|
deprecated in favor of Go version at https://github.com/google/pprof
|
2007-04-16 20:49:32 +00:00
|
|
|
|
|
|
|
|
|
|
2007-03-22 03:28:56 +00:00
|
|
|
|
TCMALLOC
|
|
|
|
|
--------
|
2007-08-17 20:56:15 +00:00
|
|
|
|
Just link in -ltcmalloc or -ltcmalloc_minimal to get the advantages of
|
|
|
|
|
tcmalloc -- a replacement for malloc and new. See below for some
|
|
|
|
|
environment variables you can use with tcmalloc, as well.
|
|
|
|
|
|
|
|
|
|
tcmalloc functionality is available on all systems we've tested; see
|
2010-08-05 20:36:47 +00:00
|
|
|
|
INSTALL for more details. See README_windows.txt for instructions on
|
2007-08-17 20:56:15 +00:00
|
|
|
|
using tcmalloc on Windows.
|
2007-03-22 03:28:56 +00:00
|
|
|
|
|
2010-06-21 15:59:56 +00:00
|
|
|
|
when compiling. gcc makes some optimizations assuming it is using its
|
|
|
|
|
own, built-in malloc; that assumption obviously isn't true with
|
|
|
|
|
tcmalloc. In practice, we haven't seen any problems with this, but
|
|
|
|
|
the expected risk is highest for users who register their own malloc
|
2012-02-04 00:07:36 +00:00
|
|
|
|
hooks with tcmalloc (using gperftools/malloc_hook.h). The risk is
|
|
|
|
|
lowest for folks who use tcmalloc_minimal (or, of course, who pass in
|
|
|
|
|
the above flags :-) ).
|
2010-06-21 15:59:56 +00:00
|
|
|
|
|
2007-03-22 03:28:56 +00:00
|
|
|
|
|
|
|
|
|
HEAP PROFILER
|
|
|
|
|
-------------
|
2017-03-21 13:07:16 +00:00
|
|
|
|
See docs/heapprofile.html for information about how to use tcmalloc's
|
2007-03-22 03:28:56 +00:00
|
|
|
|
heap profiler and analyze its output.
|
|
|
|
|
|
|
|
|
|
As a quick-start, do the following after installing this package:
|
|
|
|
|
|
|
|
|
|
1) Link your executable with -ltcmalloc
|
|
|
|
|
2) Run your executable with the HEAPPROFILE environment var set:
|
2010-11-18 01:07:25 +00:00
|
|
|
|
$ HEAPPROFILE=/tmp/heapprof <path/to/binary> [binary args]
|
2007-03-22 03:28:56 +00:00
|
|
|
|
3) Run pprof to analyze the heap usage
|
|
|
|
|
$ pprof <path/to/binary> /tmp/heapprof.0045.heap # run 'ls' to see options
|
|
|
|
|
$ pprof --gv <path/to/binary> /tmp/heapprof.0045.heap
|
|
|
|
|
|
|
|
|
|
You can also use LD_PRELOAD to heap-profile an executable that you
|
|
|
|
|
didn't compile.
|
|
|
|
|
|
|
|
|
|
There are other environment variables, besides HEAPPROFILE, you can
|
2007-08-17 20:56:15 +00:00
|
|
|
|
set to adjust the heap-profiler behavior; c.f. "ENVIRONMENT VARIABLES"
|
2007-03-22 03:28:56 +00:00
|
|
|
|
below.
|
|
|
|
|
|
2007-08-17 20:56:15 +00:00
|
|
|
|
The heap profiler is available on all unix-based systems we've tested;
|
|
|
|
|
see INSTALL for more details. It is not currently available on Windows.
|
|
|
|
|
|
2007-03-22 03:28:56 +00:00
|
|
|
|
|
2007-03-22 03:00:33 +00:00
|
|
|
|
HEAP CHECKER
|
|
|
|
|
------------
|
2017-03-21 13:07:16 +00:00
|
|
|
|
See docs/heap_checker.html for information about how to use tcmalloc's
|
2007-03-22 03:28:56 +00:00
|
|
|
|
heap checker.
|
2007-03-22 03:00:33 +00:00
|
|
|
|
|
2007-03-22 03:28:56 +00:00
|
|
|
|
In order to catch all heap leaks, tcmalloc must be linked *last* into
|
|
|
|
|
your executable. The heap checker may mischaracterize some memory
|
|
|
|
|
accesses in libraries listed after it on the link line. For instance,
|
|
|
|
|
it may report these libraries as leaking memory when they're not.
|
|
|
|
|
(See the source code for more details.)
|
2007-03-22 03:00:33 +00:00
|
|
|
|
|
2007-03-22 03:28:56 +00:00
|
|
|
|
Here's a quick-start for how to use:
|
|
|
|
|
|
|
|
|
|
As a quick-start, do the following after installing this package:
|
|
|
|
|
|
|
|
|
|
1) Link your executable with -ltcmalloc
|
|
|
|
|
2) Run your executable with the HEAPCHECK environment var set:
|
|
|
|
|
$ HEAPCHECK=1 <path/to/binary> [binary args]
|
|
|
|
|
|
|
|
|
|
Other values for HEAPCHECK: normal (equivalent to "1"), strict, draconian
|
|
|
|
|
|
|
|
|
|
You can also use LD_PRELOAD to heap-check an executable that you
|
|
|
|
|
didn't compile.
|
|
|
|
|
|
2007-08-17 20:56:15 +00:00
|
|
|
|
The heap checker is only available on Linux at this time; see INSTALL
|
|
|
|
|
for more details.
|
|
|
|
|
|
2007-03-22 03:00:33 +00:00
|
|
|
|
|
2010-06-21 15:59:56 +00:00
|
|
|
|
CPU PROFILER
|
|
|
|
|
------------
|
2017-03-21 13:07:16 +00:00
|
|
|
|
See docs/cpuprofile.html for information about how to use the CPU
|
2010-06-21 15:59:56 +00:00
|
|
|
|
profiler and analyze its output.
|
|
|
|
|
|
|
|
|
|
As a quick-start, do the following after installing this package:
|
|
|
|
|
|
|
|
|
|
1) Link your executable with -lprofiler
|
|
|
|
|
2) Run your executable with the CPUPROFILE environment var set:
|
|
|
|
|
$ CPUPROFILE=/tmp/prof.out <path/to/binary> [binary args]
|
|
|
|
|
3) Run pprof to analyze the CPU usage
|
|
|
|
|
$ pprof <path/to/binary> /tmp/prof.out # -pg-like text output
|
|
|
|
|
$ pprof --gv <path/to/binary> /tmp/prof.out # really cool graphical output
|
|
|
|
|
|
|
|
|
|
There are other environment variables, besides CPUPROFILE, you can set
|
|
|
|
|
to adjust the cpu-profiler behavior; cf "ENVIRONMENT VARIABLES" below.
|
|
|
|
|
|
|
|
|
|
The CPU profiler is available on all unix-based systems we've tested;
|
|
|
|
|
see INSTALL for more details. It is not currently available on Windows.
|
|
|
|
|
|
|
|
|
|
NOTE: CPU profiling doesn't work after fork (unless you immediately
|
|
|
|
|
do an exec()-like call afterwards). Furthermore, if you do
|
|
|
|
|
fork, and the child calls exit(), it may corrupt the profile
|
|
|
|
|
data. You can use _exit() to work around this. We hope to have
|
|
|
|
|
a fix for both problems in the next release of perftools
|
|
|
|
|
(hopefully perftools 1.2).
|
|
|
|
|
|
|
|
|
|
|
2009-06-10 02:04:26 +00:00
|
|
|
|
EVERYTHING IN ONE
|
|
|
|
|
-----------------
|
|
|
|
|
If you want the CPU profiler, heap profiler, and heap leak-checker to
|
|
|
|
|
all be available for your application, you can do:
|
|
|
|
|
gcc -o myapp ... -lprofiler -ltcmalloc
|
|
|
|
|
|
|
|
|
|
However, if you have a reason to use the static versions of the
|
|
|
|
|
library, this two-library linking won't work:
|
|
|
|
|
gcc -o myapp ... /usr/lib/libprofiler.a /usr/lib/libtcmalloc.a # errors!
|
|
|
|
|
|
|
|
|
|
Instead, use the special libtcmalloc_and_profiler library, which we
|
|
|
|
|
make for just this purpose:
|
|
|
|
|
gcc -o myapp ... /usr/lib/libtcmalloc_and_profiler.a
|
|
|
|
|
|
|
|
|
|
|
2011-05-19 21:37:12 +00:00
|
|
|
|
CONFIGURATION OPTIONS
|
|
|
|
|
---------------------
|
|
|
|
|
For advanced users, there are several flags you can pass to
|
2019-04-03 07:50:40 +00:00
|
|
|
|
'./configure' that tweak tcmalloc performance. (These are in addition
|
2011-05-19 21:37:12 +00:00
|
|
|
|
to the environment variables you can set at runtime to affect
|
|
|
|
|
tcmalloc, described below.) See the INSTALL file for details.
|
|
|
|
|
|
|
|
|
|
|
2007-03-22 03:00:33 +00:00
|
|
|
|
ENVIRONMENT VARIABLES
|
|
|
|
|
---------------------
|
2007-03-22 03:28:56 +00:00
|
|
|
|
The cpu profiler, heap checker, and heap profiler will lie dormant,
|
|
|
|
|
using no memory or CPU, until you turn them on. (Thus, there's no
|
|
|
|
|
harm in linking -lprofiler into every application, and also -ltcmalloc
|
|
|
|
|
assuming you're ok using the non-libc malloc library.)
|
|
|
|
|
|
|
|
|
|
The easiest way to turn them on is by setting the appropriate
|
|
|
|
|
environment variables. We have several variables that let you
|
|
|
|
|
enable/disable features as well as tweak parameters.
|
2007-03-22 03:00:33 +00:00
|
|
|
|
|
2007-04-16 20:49:32 +00:00
|
|
|
|
Here are some of the most important variables:
|
|
|
|
|
|
2009-06-10 02:04:26 +00:00
|
|
|
|
HEAPPROFILE=<pre> -- turns on heap profiling and dumps data using this prefix
|
|
|
|
|
HEAPCHECK=<type> -- turns on heap checking with strictness 'type'
|
2007-03-22 03:00:33 +00:00
|
|
|
|
CPUPROFILE=<file> -- turns on cpu profiling and dumps data to this file.
|
|
|
|
|
PROFILESELECTED=1 -- if set, cpu-profiler will only profile regions of code
|
|
|
|
|
surrounded with ProfilerEnable()/ProfilerDisable().
|
2013-07-06 20:48:18 +00:00
|
|
|
|
CPUPROFILE_FREQUENCY=x-- how many interrupts/second the cpu-profiler samples.
|
2007-03-22 03:28:56 +00:00
|
|
|
|
|
2016-08-25 00:47:28 +00:00
|
|
|
|
PERFTOOLS_VERBOSE=<level> -- the higher level, the more messages malloc emits
|
2007-03-22 03:00:33 +00:00
|
|
|
|
MALLOCSTATS=<level> -- prints memory-use stats at program-exit
|
|
|
|
|
|
2007-04-16 20:49:32 +00:00
|
|
|
|
For a full list of variables, see the documentation pages:
|
2016-11-15 08:58:11 +00:00
|
|
|
|
docs/cpuprofile.html
|
|
|
|
|
docs/heapprofile.html
|
|
|
|
|
docs/heap_checker.html
|
2007-04-16 20:49:32 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
COMPILING ON NON-LINUX SYSTEMS
|
|
|
|
|
------------------------------
|
|
|
|
|
|
|
|
|
|
Perftools was developed and tested on x86 Linux systems, and it works
|
|
|
|
|
in its full generality only on those systems. However, we've
|
2007-08-17 20:56:15 +00:00
|
|
|
|
successfully ported much of the tcmalloc library to FreeBSD, Solaris
|
|
|
|
|
x86, and Darwin (Mac OS X) x86 and ppc; and we've ported the basic
|
|
|
|
|
functionality in tcmalloc_minimal to Windows. See INSTALL for details.
|
2010-08-05 20:36:47 +00:00
|
|
|
|
See README_windows.txt for details on the Windows port.
|
2007-04-16 20:49:32 +00:00
|
|
|
|
|
|
|
|
|
|
2007-04-19 00:53:22 +00:00
|
|
|
|
PERFORMANCE
|
|
|
|
|
-----------
|
|
|
|
|
|
|
|
|
|
If you're interested in some third-party comparisons of tcmalloc to
|
|
|
|
|
other malloc libraries, here are a few web pages that have been
|
|
|
|
|
brought to our attention. The first discusses the effect of using
|
|
|
|
|
various malloc libraries on OpenLDAP. The second compares tcmalloc to
|
|
|
|
|
win32's malloc.
|
|
|
|
|
http://www.highlandsun.com/hyc/malloc/
|
|
|
|
|
http://gaiacrtn.free.fr/articles/win32perftools.html
|
|
|
|
|
|
2010-08-05 20:36:47 +00:00
|
|
|
|
It's possible to build tcmalloc in a way that trades off faster
|
|
|
|
|
performance (particularly for deletes) at the cost of more memory
|
|
|
|
|
fragmentation (that is, more unusable memory on your system). See the
|
|
|
|
|
INSTALL file for details.
|
|
|
|
|
|
2007-04-19 00:53:22 +00:00
|
|
|
|
|
2008-09-19 20:06:40 +00:00
|
|
|
|
OLD SYSTEM ISSUES
|
|
|
|
|
-----------------
|
|
|
|
|
|
|
|
|
|
When compiling perftools on some old systems, like RedHat 8, you may
|
|
|
|
|
get an error like this:
|
|
|
|
|
___tls_get_addr: symbol not found
|
|
|
|
|
|
|
|
|
|
This means that you have a system where some parts are updated enough
|
|
|
|
|
to support Thread Local Storage, but others are not. The perftools
|
|
|
|
|
configure script can't always detect this kind of case, leading to
|
|
|
|
|
that error. To fix it, just comment out (or delete) the line
|
|
|
|
|
#define HAVE_TLS 1
|
|
|
|
|
in your config.h file before building.
|
|
|
|
|
|
|
|
|
|
|
2007-04-16 20:49:32 +00:00
|
|
|
|
64-BIT ISSUES
|
|
|
|
|
-------------
|
|
|
|
|
|
|
|
|
|
There are two issues that can cause program hangs or crashes on x86_64
|
|
|
|
|
64-bit systems, which use the libunwind library to get stack-traces.
|
|
|
|
|
Neither issue should affect the core tcmalloc library; they both
|
|
|
|
|
affect the perftools tools such as cpu-profiler, heap-checker, and
|
|
|
|
|
heap-profiler.
|
|
|
|
|
|
|
|
|
|
1) Some libc's -- at least glibc 2.4 on x86_64 -- have a bug where the
|
|
|
|
|
libc function dl_iterate_phdr() acquires its locks in the wrong
|
|
|
|
|
order. This bug should not affect tcmalloc, but may cause occasional
|
|
|
|
|
deadlock with the cpu-profiler, heap-profiler, and heap-checker.
|
|
|
|
|
Its likeliness increases the more dlopen() commands an executable has.
|
|
|
|
|
Most executables don't have any, though several library routines like
|
2007-08-17 20:56:15 +00:00
|
|
|
|
getgrgid() call dlopen() behind the scenes.
|
2007-04-16 20:49:32 +00:00
|
|
|
|
|
|
|
|
|
2) On x86-64 64-bit systems, while tcmalloc itself works fine, the
|
|
|
|
|
cpu-profiler tool is unreliable: it will sometimes work, but sometimes
|
|
|
|
|
cause a segfault. I'll explain the problem first, and then some
|
|
|
|
|
workarounds.
|
|
|
|
|
|
|
|
|
|
Note that this only affects the cpu-profiler, which is a
|
2012-02-04 00:07:36 +00:00
|
|
|
|
gperftools feature you must turn on manually by setting the
|
2007-04-16 20:49:32 +00:00
|
|
|
|
CPUPROFILE environment variable. If you do not turn on cpu-profiling,
|
|
|
|
|
you shouldn't see any crashes due to perftools.
|
|
|
|
|
|
|
|
|
|
The gory details: The underlying problem is in the backtrace()
|
2009-03-11 20:50:03 +00:00
|
|
|
|
function, which is a built-in function in libc.
|
2007-04-16 20:49:32 +00:00
|
|
|
|
Backtracing is fairly straightforward in the normal case, but can run
|
|
|
|
|
into problems when having to backtrace across a signal frame.
|
|
|
|
|
Unfortunately, the cpu-profiler uses signals in order to register a
|
|
|
|
|
profiling event, so every backtrace that the profiler does crosses a
|
|
|
|
|
signal frame.
|
|
|
|
|
|
|
|
|
|
In our experience, the only time there is trouble is when the signal
|
|
|
|
|
fires in the middle of pthread_mutex_lock. pthread_mutex_lock is
|
2009-03-11 20:50:03 +00:00
|
|
|
|
called quite a bit from system libraries, particularly at program
|
2007-04-16 20:49:32 +00:00
|
|
|
|
startup and when creating a new thread.
|
|
|
|
|
|
|
|
|
|
The solution: The dwarf debugging format has support for 'cfi
|
|
|
|
|
annotations', which make it easy to recognize a signal frame. Some OS
|
|
|
|
|
distributions, such as Fedora and gentoo 2007.0, already have added
|
|
|
|
|
cfi annotations to their libc. A future version of libunwind should
|
|
|
|
|
recognize these annotations; these systems should not see any
|
2019-04-03 07:50:40 +00:00
|
|
|
|
crashes.
|
2007-04-16 20:49:32 +00:00
|
|
|
|
|
|
|
|
|
Workarounds: If you see problems with crashes when running the
|
|
|
|
|
cpu-profiler, consider inserting ProfilerStart()/ProfilerStop() into
|
|
|
|
|
your code, rather than setting CPUPROFILE. This will profile only
|
|
|
|
|
those sections of the codebase. Though we haven't done much testing,
|
|
|
|
|
in theory this should reduce the chance of crashes by limiting the
|
|
|
|
|
signal generation to only a small part of the codebase. Ideally, you
|
|
|
|
|
would not use ProfilerStart()/ProfilerStop() around code that spawns
|
|
|
|
|
new threads, or is otherwise likely to cause a call to
|
|
|
|
|
pthread_mutex_lock!
|
|
|
|
|
|
2007-03-22 03:28:56 +00:00
|
|
|
|
---
|
2011-05-19 21:37:12 +00:00
|
|
|
|
17 May 2011
|