Go to file
csilvers 100c38c1a2 Fri Jul 15 16:10:51 2011 Google Inc. <opensource@google.com>
* google-perftools: version 1.8 release
	* PORTING: (Disabled) support for patching mmap on freebsd (chapp...)
	* PORTING: Support volatile __malloc_hook for glibc 2.14 (csilvers)
	* PORTING: Use _asm rdtsc and __rdtsc to get cycleclock in windows (koda)
	* PORTING: Fix fd vs. HANDLE compiler error on cygwin (csilvers)
	* PORTING: Do not test memalign or double-linking on OS X (csilvers)
	* PORTING: Actually enable TLS on windows (jontra)
	* PORTING: Some work to compile under Native Client (krasin)
	* PORTING: deal with pthread_once w/o -pthread on freebsd (csilvers)
	* Rearrange libc-overriding to make it easier to port (csilvers)
	* Display source locations in pprof disassembly (sanjay)
	* BUGFIX: Actually initialize allocator name (mec)
	* BUGFIX: Keep track of 'overhead' bytes in malloc reporting (csilvers)
	* Allow ignoring one object twice in the leak checker (glider)
	* BUGFIX: top10 in pprof should print 10 lines, not 11 (rsc)
	* Refactor vdso source files (tipp)
	* Some documentation cleanups
	* Document MAX_TOTAL_THREAD_CACHE_SIZE <= 1Gb (nsethi)
	* Add MallocExtension::GetOwnership(ptr) (csilvers)
	* BUGFIX: We were leaving out a needed $(top_srcdir) in the Makefile
	* PORTING: Support getting argv0 on OS X
	* Add 'weblist' command to pprof: like 'list' but html (sanjay)
	* Improve source listing in pprof (sanjay)
	* Cap cache sizes to reduce fragmentation (ruemmler)
	* Improve performance by capping or increasing sizes (ruemmler)
	* Add M{,un}mapReplacmenet hooks into MallocHook (ribrdb)
	* Refactored system allocator logic (gangren)
	* Include cleanups (csilvers)
	* Add TCMALLOC_SMALL_BUT_SLOW support (ruemmler)
	* Clarify that tcmalloc stats are MiB (robinson)
	* Remove support for non-tcmalloc debugallocation (blount)
	* Add a new test: malloc_hook_test (csilvers)
	* Change the configure script to be more crosstool-friendly (mcgrathr)
	* PORTING: leading-underscore changes to support win64 (csilvers)
	* Improve debugallocation tc_malloc_size (csilvers)
	* Extend atomicops.h and cyceclock to use ARM V6+ optimized code (sanek)
	* Change malloc-hook to use a list-like structure (llib)
	* Add flag to use MAP_PRIVATE in memfs_malloc (gangren)
	* Windows support for pprof: nul and /usr/bin/file (csilvers)
	* TESTING: add test on strdup to tcmalloc_test (csilvers)
	* Augment heap-checker to deal with no-inode maps (csilvers)
	* Count .dll/.dylib as shared libs in heap-checker (csilvers)
	* Disable sys_futex for arm; it's not always reliable (sanek)
	* PORTING: change lots of windows/port.h macros to functions
	* BUGFIX: Generate correct version# in tcmalloc.h on windows (csilvers)
	* PORTING: Some casting to make solaris happier about types (csilvers)
	* TESTING: Disable debugallocation_test in 'minimal' mode (csilvers)
	* Rewrite debugallocation to be more modular (csilvers)
	* Don't try to run the heap-checker under valgrind (ppluzhnikov)
	* BUGFIX: Make focused stat %'s relative, not absolute (sanjay)
	* BUGFIX: Don't use '//' comments in a C file (csilvers)
	* Quiet new-gcc compiler warnings via -Wno-unused-result, etc (csilvers)


git-svn-id: http://gperftools.googlecode.com/svn/trunk@110 6b5cf1ce-ec42-a296-1ba9-69fdba395a50
2011-07-16 01:07:10 +00:00
doc Fri Jul 15 16:10:51 2011 Google Inc. <opensource@google.com> 2011-07-16 01:07:10 +00:00
m4 Fri Feb 04 15:54:31 2011 Google Inc. <opensource@google.com> 2011-02-05 00:19:37 +00:00
packages Fri Jul 15 16:10:51 2011 Google Inc. <opensource@google.com> 2011-07-16 01:07:10 +00:00
src Fri Jul 15 16:10:51 2011 Google Inc. <opensource@google.com> 2011-07-16 01:07:10 +00:00
vsprojects Fri Jul 15 16:10:51 2011 Google Inc. <opensource@google.com> 2011-07-16 01:07:10 +00:00
AUTHORS Tue Feb 8 09:57:17 2005 El Goog <opensource@google.com> 2007-03-22 03:00:33 +00:00
COPYING Wed Jun 14 15:11:14 2006 Google Inc. <opensource@google.com> 2007-03-22 04:55:49 +00:00
ChangeLog Fri Jul 15 16:10:51 2011 Google Inc. <opensource@google.com> 2011-07-16 01:07:10 +00:00
INSTALL * Fix typos in comment in profiler.h (nrhodes) 2011-05-19 21:37:12 +00:00
Makefile.am Fri Jul 15 16:10:51 2011 Google Inc. <opensource@google.com> 2011-07-16 01:07:10 +00:00
Makefile.in Fri Jul 15 16:10:51 2011 Google Inc. <opensource@google.com> 2011-07-16 01:07:10 +00:00
NEWS Fri Jul 15 16:10:51 2011 Google Inc. <opensource@google.com> 2011-07-16 01:07:10 +00:00
README Fri Jul 15 16:10:51 2011 Google Inc. <opensource@google.com> 2011-07-16 01:07:10 +00:00
README_windows.txt * Fix typos in comment in profiler.h (nrhodes) 2011-05-19 21:37:12 +00:00
TODO Tue Mar 18 14:30:44 2008 Google Inc. <opensource@google.com> 2008-03-19 23:35:27 +00:00
aclocal.m4 * Suppress all large allocs when report threshold==0 2010-11-18 01:07:25 +00:00
autogen.sh * google-perftools: version 1.5 release 2010-01-20 22:47:29 +00:00
compile Tue Jul 17 22:26:27 2007 Google Inc. <opensource@google.com> 2007-07-18 18:30:50 +00:00
config.guess Fri Apr 17 16:40:48 2009 Google Inc. <opensource@google.com> 2009-04-18 00:02:25 +00:00
config.sub Fri Apr 17 16:40:48 2009 Google Inc. <opensource@google.com> 2009-04-18 00:02:25 +00:00
configure Fri Jul 15 16:10:51 2011 Google Inc. <opensource@google.com> 2011-07-16 01:07:10 +00:00
configure.ac Fri Jul 15 16:10:51 2011 Google Inc. <opensource@google.com> 2011-07-16 01:07:10 +00:00
depcomp Tue Jul 17 22:26:27 2007 Google Inc. <opensource@google.com> 2007-07-18 18:30:50 +00:00
google-perftools.sln Fri Jul 15 16:10:51 2011 Google Inc. <opensource@google.com> 2011-07-16 01:07:10 +00:00
install-sh Tue Jul 17 22:26:27 2007 Google Inc. <opensource@google.com> 2007-07-18 18:30:50 +00:00
ltmain.sh * Suppress all large allocs when report threshold==0 2010-11-18 01:07:25 +00:00
missing Tue Jul 17 22:26:27 2007 Google Inc. <opensource@google.com> 2007-07-18 18:30:50 +00:00
mkinstalldirs Tue Jul 17 22:26:27 2007 Google Inc. <opensource@google.com> 2007-07-18 18:30:50 +00:00

README

IMPORTANT NOTE FOR 64-BIT USERS
-------------------------------
There are known issues with some perftools functionality on x86_64
systems.  See 64-BIT ISSUES, below.


TCMALLOC
--------
Just link in -ltcmalloc or -ltcmalloc_minimal to get the advantages of
tcmalloc -- a replacement for malloc and new.  See below for some
environment variables you can use with tcmalloc, as well.

tcmalloc functionality is available on all systems we've tested; see
INSTALL for more details.  See README_windows.txt for instructions on
using tcmalloc on Windows.

NOTE: When compiling with programs with gcc, that you plan to link
with libtcmalloc, it's safest to pass in the flags

 -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free

when compiling.  gcc makes some optimizations assuming it is using its
own, built-in malloc; that assumption obviously isn't true with
tcmalloc.  In practice, we haven't seen any problems with this, but
the expected risk is highest for users who register their own malloc
hooks with tcmalloc (using google/malloc_hook.h).  The risk is lowest
for folks who use tcmalloc_minimal (or, of course, who pass in the
above flags :-) ).


HEAP PROFILER
-------------
See doc/heap-profiler.html for information about how to use tcmalloc's
heap profiler and analyze its output.

As a quick-start, do the following after installing this package:

1) Link your executable with -ltcmalloc
2) Run your executable with the HEAPPROFILE environment var set:
     $ HEAPPROFILE=/tmp/heapprof <path/to/binary> [binary args]
3) Run pprof to analyze the heap usage
     $ pprof <path/to/binary> /tmp/heapprof.0045.heap  # run 'ls' to see options
     $ pprof --gv <path/to/binary> /tmp/heapprof.0045.heap

You can also use LD_PRELOAD to heap-profile an executable that you
didn't compile.

There are other environment variables, besides HEAPPROFILE, you can
set to adjust the heap-profiler behavior; c.f. "ENVIRONMENT VARIABLES"
below.

The heap profiler is available on all unix-based systems we've tested;
see INSTALL for more details.  It is not currently available on Windows.


HEAP CHECKER
------------
See doc/heap-checker.html for information about how to use tcmalloc's
heap checker.

In order to catch all heap leaks, tcmalloc must be linked *last* into
your executable.  The heap checker may mischaracterize some memory
accesses in libraries listed after it on the link line.  For instance,
it may report these libraries as leaking memory when they're not.
(See the source code for more details.)

Here's a quick-start for how to use:

As a quick-start, do the following after installing this package:

1) Link your executable with -ltcmalloc
2) Run your executable with the HEAPCHECK environment var set:
     $ HEAPCHECK=1 <path/to/binary> [binary args]

Other values for HEAPCHECK: normal (equivalent to "1"), strict, draconian

You can also use LD_PRELOAD to heap-check an executable that you
didn't compile.

The heap checker is only available on Linux at this time; see INSTALL
for more details.


CPU PROFILER
------------
See doc/cpu-profiler.html for information about how to use the CPU
profiler and analyze its output.

As a quick-start, do the following after installing this package:

1) Link your executable with -lprofiler
2) Run your executable with the CPUPROFILE environment var set:
     $ CPUPROFILE=/tmp/prof.out <path/to/binary> [binary args]
3) Run pprof to analyze the CPU usage
     $ pprof <path/to/binary> /tmp/prof.out      # -pg-like text output
     $ pprof --gv <path/to/binary> /tmp/prof.out # really cool graphical output

There are other environment variables, besides CPUPROFILE, you can set
to adjust the cpu-profiler behavior; cf "ENVIRONMENT VARIABLES" below.

The CPU profiler is available on all unix-based systems we've tested;
see INSTALL for more details.  It is not currently available on Windows.

NOTE: CPU profiling doesn't work after fork (unless you immediately
      do an exec()-like call afterwards).  Furthermore, if you do
      fork, and the child calls exit(), it may corrupt the profile
      data.  You can use _exit() to work around this.  We hope to have
      a fix for both problems in the next release of perftools
      (hopefully perftools 1.2).


EVERYTHING IN ONE
-----------------
If you want the CPU profiler, heap profiler, and heap leak-checker to
all be available for your application, you can do:
   gcc -o myapp ... -lprofiler -ltcmalloc

However, if you have a reason to use the static versions of the
library, this two-library linking won't work:
   gcc -o myapp ... /usr/lib/libprofiler.a /usr/lib/libtcmalloc.a  # errors!

Instead, use the special libtcmalloc_and_profiler library, which we
make for just this purpose:
   gcc -o myapp ... /usr/lib/libtcmalloc_and_profiler.a


CONFIGURATION OPTIONS
---------------------
For advanced users, there are several flags you can pass to
'./configure' that tweak tcmalloc performace.  (These are in addition
to the environment variables you can set at runtime to affect
tcmalloc, described below.)  See the INSTALL file for details.


ENVIRONMENT VARIABLES
---------------------
The cpu profiler, heap checker, and heap profiler will lie dormant,
using no memory or CPU, until you turn them on.  (Thus, there's no
harm in linking -lprofiler into every application, and also -ltcmalloc
assuming you're ok using the non-libc malloc library.)

The easiest way to turn them on is by setting the appropriate
environment variables.  We have several variables that let you
enable/disable features as well as tweak parameters.

Here are some of the most important variables:

HEAPPROFILE=<pre> -- turns on heap profiling and dumps data using this prefix
HEAPCHECK=<type>  -- turns on heap checking with strictness 'type'
CPUPROFILE=<file> -- turns on cpu profiling and dumps data to this file.
PROFILESELECTED=1 -- if set, cpu-profiler will only profile regions of code
                     surrounded with ProfilerEnable()/ProfilerDisable().
PROFILEFREQUENCY=x-- how many interrupts/second the cpu-profiler samples.

TCMALLOC_DEBUG=<level> -- the higher level, the more messages malloc emits
MALLOCSTATS=<level>    -- prints memory-use stats at program-exit

For a full list of variables, see the documentation pages:
   doc/cpuprofile.html
   doc/heapprofile.html
   doc/heap_checker.html


COMPILING ON NON-LINUX SYSTEMS
------------------------------

Perftools was developed and tested on x86 Linux systems, and it works
in its full generality only on those systems.  However, we've
successfully ported much of the tcmalloc library to FreeBSD, Solaris
x86, and Darwin (Mac OS X) x86 and ppc; and we've ported the basic
functionality in tcmalloc_minimal to Windows.  See INSTALL for details.
See README_windows.txt for details on the Windows port.


PERFORMANCE
-----------

If you're interested in some third-party comparisons of tcmalloc to
other malloc libraries, here are a few web pages that have been
brought to our attention.  The first discusses the effect of using
various malloc libraries on OpenLDAP.  The second compares tcmalloc to
win32's malloc.
  http://www.highlandsun.com/hyc/malloc/
  http://gaiacrtn.free.fr/articles/win32perftools.html

It's possible to build tcmalloc in a way that trades off faster
performance (particularly for deletes) at the cost of more memory
fragmentation (that is, more unusable memory on your system).  See the
INSTALL file for details.


OLD SYSTEM ISSUES
-----------------

When compiling perftools on some old systems, like RedHat 8, you may
get an error like this:
    ___tls_get_addr: symbol not found

This means that you have a system where some parts are updated enough
to support Thread Local Storage, but others are not.  The perftools
configure script can't always detect this kind of case, leading to
that error.  To fix it, just comment out (or delete) the line
   #define HAVE_TLS 1
in your config.h file before building.


64-BIT ISSUES
-------------

There are two issues that can cause program hangs or crashes on x86_64
64-bit systems, which use the libunwind library to get stack-traces.
Neither issue should affect the core tcmalloc library; they both
affect the perftools tools such as cpu-profiler, heap-checker, and
heap-profiler.

1) Some libc's -- at least glibc 2.4 on x86_64 -- have a bug where the
libc function dl_iterate_phdr() acquires its locks in the wrong
order.  This bug should not affect tcmalloc, but may cause occasional
deadlock with the cpu-profiler, heap-profiler, and heap-checker.
Its likeliness increases the more dlopen() commands an executable has.
Most executables don't have any, though several library routines like
getgrgid() call dlopen() behind the scenes.

2) On x86-64 64-bit systems, while tcmalloc itself works fine, the
cpu-profiler tool is unreliable: it will sometimes work, but sometimes
cause a segfault.  I'll explain the problem first, and then some
workarounds.

Note that this only affects the cpu-profiler, which is a
google-perftools feature you must turn on manually by setting the
CPUPROFILE environment variable.  If you do not turn on cpu-profiling,
you shouldn't see any crashes due to perftools.

The gory details: The underlying problem is in the backtrace()
function, which is a built-in function in libc.
Backtracing is fairly straightforward in the normal case, but can run
into problems when having to backtrace across a signal frame.
Unfortunately, the cpu-profiler uses signals in order to register a
profiling event, so every backtrace that the profiler does crosses a
signal frame.

In our experience, the only time there is trouble is when the signal
fires in the middle of pthread_mutex_lock.  pthread_mutex_lock is
called quite a bit from system libraries, particularly at program
startup and when creating a new thread.

The solution: The dwarf debugging format has support for 'cfi
annotations', which make it easy to recognize a signal frame.  Some OS
distributions, such as Fedora and gentoo 2007.0, already have added
cfi annotations to their libc.  A future version of libunwind should
recognize these annotations; these systems should not see any
crashses.

Workarounds: If you see problems with crashes when running the
cpu-profiler, consider inserting ProfilerStart()/ProfilerStop() into
your code, rather than setting CPUPROFILE.  This will profile only
those sections of the codebase.  Though we haven't done much testing,
in theory this should reduce the chance of crashes by limiting the
signal generation to only a small part of the codebase.  Ideally, you
would not use ProfilerStart()/ProfilerStop() around code that spawns
new threads, or is otherwise likely to cause a call to
pthread_mutex_lock!

---
17 May 2011