mirror of
https://github.com/gperftools/gperftools
synced 2024-12-17 21:14:30 +00:00
refresh INSTALL
This commit is contained in:
parent
e3de2e3242
commit
bef6592746
293
INSTALL
293
INSTALL
@ -8,6 +8,9 @@ unlimited permission to copy, distribute and modify it.
|
|||||||
Perftools-Specific Install Notes
|
Perftools-Specific Install Notes
|
||||||
================================
|
================================
|
||||||
|
|
||||||
|
See generic autotool-provided installation notes at the
|
||||||
|
end. Immediately below you can see gperftools-specific details.
|
||||||
|
|
||||||
*** Building from source repository
|
*** Building from source repository
|
||||||
|
|
||||||
As of 2.1 gperftools does not have configure and other autotools
|
As of 2.1 gperftools does not have configure and other autotools
|
||||||
@ -31,68 +34,63 @@ dist (or, preferably, make distcheck) and it'll produce .tar.gz or
|
|||||||
build our software without having autotools.
|
build our software without having autotools.
|
||||||
|
|
||||||
|
|
||||||
*** NOTE FOR 64-BIT LINUX SYSTEMS
|
*** Stacktrace capturing details
|
||||||
|
|
||||||
The glibc built-in stack-unwinder on 64-bit systems has some problems
|
A number of gperftools facilities capture stack traces. And
|
||||||
with the perftools libraries. (In particular, the cpu/heap profiler
|
occasionally this happens in 'tricky' locations, like in SIGPROF
|
||||||
may be in the middle of malloc, holding some malloc-related locks when
|
handler. So some platforms and library versions occasionally cause
|
||||||
they invoke the stack unwinder. The built-in stack unwinder may call
|
troubles (crashes or hangs, or truncated stack traces).
|
||||||
malloc recursively, which may require the thread to acquire a lock it
|
|
||||||
already holds: deadlock.)
|
|
||||||
|
|
||||||
For that reason, if you use a 64-bit system, we strongly recommend you
|
So we do provide several implementations that our users are able to
|
||||||
install libunwind before trying to configure or install gperftools.
|
select at runtime. Pass TCMALLOC_STACKTRACE_METHOD_VERBOSE=t as
|
||||||
libunwind can be found at
|
environment variable to ./stacktrace_unittest to see options.
|
||||||
|
|
||||||
http://download.savannah.gnu.org/releases/libunwind/libunwind-0.99-beta.tar.gz
|
* frame-pointer-based stacktracing is fully supported on x86 (all 3
|
||||||
|
kinds: i386, x32 and x86-64 are suppored), aarch64 and riscv. But
|
||||||
|
all modern architectures and ABIs by default build code without
|
||||||
|
frame pointers (even on i386). So in order to get anything useful
|
||||||
|
out of this option, you need to build your code with frame
|
||||||
|
pointers. It adds some performance overhead (usually people quote
|
||||||
|
order of 2%-3%, but it can really vary based on workloads). Also it
|
||||||
|
is worth mentioning, that it is fairly common for various asm
|
||||||
|
routines not to have frame pointers, so you'll have somewhat
|
||||||
|
imperfect profiles out of typical asm bits like memcpy. This stack
|
||||||
|
trace capuring method is also fastest (like 2-3 orders of magnitude
|
||||||
|
faster), which will matter when stacktrace capturing is done a lot
|
||||||
|
(e.g. heap profiler).
|
||||||
|
|
||||||
Even if you already have libunwind installed, you should check the
|
* libgcc-based stacktracing works particularly great on modern
|
||||||
version. Versions older than this will not work properly; too-new
|
GNU/Linux systems with glibc 2.34 or later and libgcc from gcc 12 or
|
||||||
versions introduce new code that does not work well with perftools
|
later. Thanks to usage of dl_find_object API introduced in recent
|
||||||
(because libunwind can call malloc, which will lead to deadlock).
|
glibc-s this implementation seems to be truly async-signal safe and
|
||||||
|
it is reasonably fast too. On Linux and other ELF platforms it uses
|
||||||
|
eh_frame facility (which is very similar to dwarf unwind info). It
|
||||||
|
was originally introduced for exception handling. On most modern
|
||||||
|
platforms this unwind info is automatically added by compilers. On
|
||||||
|
others you might need to add -fexceptions and/or
|
||||||
|
-fasynchrnous-unwind-tables to your compiler flags. To make this
|
||||||
|
option default, pass --enable-libgcc-unwinder-by-default to
|
||||||
|
configure. When used without dl_find_object it will occasionally
|
||||||
|
deadlock especially when used in cpuprofiler.
|
||||||
|
|
||||||
There have been reports of crashes with libunwind 0.99 (see
|
* libunwind is another supported mechanism and is default when
|
||||||
http://code.google.com/p/gperftools/issues/detail?id=374).
|
available. It also depends on eh_frame stuff (or dwarf or some
|
||||||
Alternately, you can use a more recent libunwind (e.g. 1.0.1) at the
|
arm-specific thingy when available). When using it, be sure to use
|
||||||
cost of adding a bit of boilerplate to your code. For details, see
|
latest available libunwind version. As with libgcc some people
|
||||||
http://groups.google.com/group/google-perftools/msg/2686d9f24ac4365f
|
occasionally had trouble with it on codes with broken or missing
|
||||||
|
unwind info. If you encounter something like that, first make sure
|
||||||
|
to file tickets against your compiler vender. Second, libunwind has
|
||||||
|
configure option to check accesses more thoroughly, so consider
|
||||||
|
that.
|
||||||
|
|
||||||
CAUTION: if you install libunwind from the url above, be aware that
|
* many systems provide backtrace() function either as part of their
|
||||||
you may have trouble if you try to statically link your binary with
|
libc or in -lexecinfo. On most systems, including GNU/Linux, it is
|
||||||
perftools: that is, if you link with 'gcc -static -lgcc_eh ...'.
|
not built by default, so pass --enable-stacktrace-via-backtrace to
|
||||||
This is because both libunwind and libgcc implement the same C++
|
configure to enable it. Occasionally this implementation will call
|
||||||
exception handling APIs, but they implement them differently on
|
malloc when capturing backtrace, but we should automagically handle
|
||||||
some platforms. This is not likely to be a problem on ia64, but
|
it via our "emergency malloc" facility which is now built by default
|
||||||
may be on x86-64.
|
on most systems (but it currently doesn't handle being used by
|
||||||
|
cpuprofiler).
|
||||||
Also, if you link binaries statically, make sure that you add
|
|
||||||
-Wl,--eh-frame-hdr to your linker options. This is required so that
|
|
||||||
libunwind can find the information generated by the compiler
|
|
||||||
required for stack unwinding.
|
|
||||||
|
|
||||||
Using -static is rare, though, so unless you know this will affect
|
|
||||||
you it probably won't.
|
|
||||||
|
|
||||||
If you cannot or do not wish to install libunwind, you can still try
|
|
||||||
to use the built-in stack unwinder. The built-in stack unwinder
|
|
||||||
requires that your application, the tcmalloc library, and system
|
|
||||||
libraries like libc, all be compiled with a frame pointer. This is
|
|
||||||
*not* the default for x86-64.
|
|
||||||
|
|
||||||
If you are on x86-64 system, know that you have a set of system
|
|
||||||
libraries with frame-pointers enabled, and compile all your
|
|
||||||
applications with -fno-omit-frame-pointer, then you can enable the
|
|
||||||
built-in perftools stack unwinder by passing the
|
|
||||||
--enable-frame-pointers flag to configure.
|
|
||||||
|
|
||||||
Even with the use of libunwind, there are still known problems with
|
|
||||||
stack unwinding on 64-bit systems, particularly x86-64. See the
|
|
||||||
"64-BIT ISSUES" section in README.
|
|
||||||
|
|
||||||
If you encounter problems, try compiling perftools with './configure
|
|
||||||
--enable-frame-pointers'. Note you will need to compile your
|
|
||||||
application with frame pointers (via 'gcc -fno-omit-frame-pointer
|
|
||||||
...') in this case.
|
|
||||||
|
|
||||||
|
|
||||||
*** TCMALLOC LARGE PAGES: TRADING TIME FOR SPACE
|
*** TCMALLOC LARGE PAGES: TRADING TIME FOR SPACE
|
||||||
@ -138,20 +136,6 @@ flag yet. To build libtcmalloc with smaller internal caches, run
|
|||||||
(or add -DTCMALLOC_SMALL_BUT_SLOW to your existing CXXFLAGS argument).
|
(or add -DTCMALLOC_SMALL_BUT_SLOW to your existing CXXFLAGS argument).
|
||||||
|
|
||||||
|
|
||||||
*** NOTE FOR ___tls_get_addr ERROR
|
|
||||||
|
|
||||||
When compiling perftools on some old systems, like RedHat 8, you may
|
|
||||||
get an error like this:
|
|
||||||
___tls_get_addr: symbol not found
|
|
||||||
|
|
||||||
This means that you have a system where some parts are updated enough
|
|
||||||
to support Thread Local Storage, but others are not. The perftools
|
|
||||||
configure script can't always detect this kind of case, leading to
|
|
||||||
that error. To fix it, just comment out the line
|
|
||||||
#define HAVE_TLS 1
|
|
||||||
in your config.h file before building.
|
|
||||||
|
|
||||||
|
|
||||||
*** TCMALLOC AND DLOPEN
|
*** TCMALLOC AND DLOPEN
|
||||||
|
|
||||||
To improve performance, we use the "initial exec" model of Thread
|
To improve performance, we use the "initial exec" model of Thread
|
||||||
@ -159,132 +143,37 @@ Local Storage in tcmalloc. The price for this is the library will not
|
|||||||
work correctly if it is loaded via dlopen(). This should not be a
|
work correctly if it is loaded via dlopen(). This should not be a
|
||||||
problem, since loading a malloc-replacement library via dlopen is
|
problem, since loading a malloc-replacement library via dlopen is
|
||||||
asking for trouble in any case: some data will be allocated with one
|
asking for trouble in any case: some data will be allocated with one
|
||||||
malloc, some with another. If, for some reason, you *do* need to use
|
malloc, some with another.
|
||||||
dlopen on tcmalloc, the easiest way is to use a version of tcmalloc
|
|
||||||
with TLS turned off; see the ___tls_get_addr note above.
|
|
||||||
|
|
||||||
|
|
||||||
*** COMPILING ON NON-LINUX SYSTEMS
|
*** COMPILING ON NON-LINUX SYSTEMS
|
||||||
|
|
||||||
Perftools has been tested on the following systems:
|
We regularly build and test on typical modern GNU/Linux systems. You
|
||||||
FreeBSD 6.0 (x86)
|
should expect all tests to pass on modern Linux distros and x86,
|
||||||
FreeBSD 8.1 (x86_64)
|
aarch64 and riscv machines. Other machine types may fail some tests,
|
||||||
Linux CentOS 5.5 (x86_64)
|
but you should expect at least malloc to be fully functional.
|
||||||
Linux Debian 4.0 (PPC)
|
|
||||||
Linux Debian 5.0 (x86)
|
|
||||||
Linux Fedora Core 3 (x86)
|
|
||||||
Linux Fedora Core 4 (x86)
|
|
||||||
Linux Fedora Core 5 (x86)
|
|
||||||
Linux Fedora Core 6 (x86)
|
|
||||||
Linux Fedora Core 13 (x86_64)
|
|
||||||
Linux Fedora Core 14 (x86_64)
|
|
||||||
Linux RedHat 9 (x86)
|
|
||||||
Linux Slackware 13 (x86_64)
|
|
||||||
Linux Ubuntu 6.06.1 (x86)
|
|
||||||
Linux Ubuntu 6.06.1 (x86_64)
|
|
||||||
Linux Ubuntu 10.04 (x86)
|
|
||||||
Linux Ubuntu 10.10 (x86_64)
|
|
||||||
Mac OS X 10.3.9 (Panther) (PowerPC)
|
|
||||||
Mac OS X 10.4.8 (Tiger) (PowerPC)
|
|
||||||
Mac OS X 10.4.8 (Tiger) (x86)
|
|
||||||
Mac OS X 10.5 (Leopard) (x86)
|
|
||||||
Mac OS X 10.6 (Snow Leopard) (x86)
|
|
||||||
Solaris 10 (x86_64)
|
|
||||||
Windows XP, Visual Studio 2003 (VC++ 7.1) (x86)
|
|
||||||
Windows XP, Visual Studio 2005 (VC++ 8) (x86)
|
|
||||||
Windows XP, Visual Studio 2005 (VC++ 9) (x86)
|
|
||||||
Windows XP, Visual Studio 2005 (VC++ 10) (x86)
|
|
||||||
Windows XP, MinGW 5.1.3 (x86)
|
|
||||||
Windows XP, Cygwin 5.1 (x86)
|
|
||||||
|
|
||||||
It works in its full generality on the Linux systems
|
Perftools has been tested on the following non-Linux systems:
|
||||||
tested (though see 64-bit notes above). Portions of perftools work on
|
Various recent versions of FreeBSD (x86-64 mostly)
|
||||||
the other systems. The basic memory-allocation library,
|
Recent version of NetBSD (x86-64)
|
||||||
tcmalloc_minimal, works on all systems. The cpu-profiler also works
|
Recent versions of OSX (aarch64, x86 and ppc hasn't been tested for some time)
|
||||||
fairly widely. However, the heap-profiler and heap-checker are not
|
Solaris 10 (x86_64), but not recently
|
||||||
yet as widely supported. In general, the 'configure' script will
|
Windows using both MSVC (starting from MSVC 2015 and later) and mingw toolchains
|
||||||
detect what OS you are building for, and only build the components
|
Windows XP and other obsolete versions have not been tested recently
|
||||||
that work on that OS.
|
Windows XP, Cygwin 5.1 (x86), but not recently
|
||||||
|
|
||||||
|
Portions of gperftools work on those other systems. The basic
|
||||||
|
memory-allocation library, tcmalloc_minimal, works on all systems.
|
||||||
|
The cpu-profiler also works fairly widely. However, the heap-profiler
|
||||||
|
and heap-checker are not yet as widely supported. Heap checker is now
|
||||||
|
deprecated. In general, the 'configure' script will detect what OS you
|
||||||
|
are building for, and only build the components that work on that OS.
|
||||||
|
|
||||||
Note that tcmalloc_minimal is perfectly usable as a malloc/new
|
Note that tcmalloc_minimal is perfectly usable as a malloc/new
|
||||||
replacement, so it is possible to use tcmalloc on all the systems
|
replacement, so it is possible to use tcmalloc on all the systems
|
||||||
above, by linking in libtcmalloc_minimal.
|
above, by linking in libtcmalloc_minimal.
|
||||||
|
|
||||||
** FreeBSD:
|
** Solaris 10 x86: (note, this is fairly old)
|
||||||
|
|
||||||
The following binaries build and run successfully (creating
|
|
||||||
libtcmalloc_minimal.so and libprofile.so in the process):
|
|
||||||
% ./configure
|
|
||||||
% make tcmalloc_minimal_unittest tcmalloc_minimal_large_unittest \
|
|
||||||
addressmap_unittest atomicops_unittest frag_unittest \
|
|
||||||
low_level_alloc_unittest markidle_unittest memalign_unittest \
|
|
||||||
packed_cache_test stacktrace_unittest system_alloc_unittest \
|
|
||||||
thread_dealloc_unittest profiler_unittest.sh
|
|
||||||
% ./tcmalloc_minimal_unittest # to run this test
|
|
||||||
% [etc] # to run other tests
|
|
||||||
|
|
||||||
Three caveats: first, frag_unittest tries to allocate 400M of memory,
|
|
||||||
and if you have less virtual memory on your system, the test may
|
|
||||||
fail with a bad_alloc exception.
|
|
||||||
|
|
||||||
Second, profiler_unittest.sh sometimes fails in the "fork" test.
|
|
||||||
This is because stray SIGPROF signals from the parent process are
|
|
||||||
making their way into the child process. (This may be a kernel
|
|
||||||
bug that only exists in older kernels.) The profiling code itself
|
|
||||||
is working fine. This only affects programs that call fork(); for
|
|
||||||
most programs, the cpu profiler is entirely safe to use.
|
|
||||||
|
|
||||||
Third, perftools depends on /proc to get shared library
|
|
||||||
information. If you are running a FreeBSD system without proc,
|
|
||||||
perftools will not be able to map addresses to functions. Some
|
|
||||||
unittests will fail as a result.
|
|
||||||
|
|
||||||
Finally, the new test introduced in perftools-1.2,
|
|
||||||
profile_handler_unittest, fails on FreeBSD. It has something to do
|
|
||||||
with how the itimer works. The cpu profiler test passes, so I
|
|
||||||
believe the functionality is correct and the issue is with the test
|
|
||||||
somehow. If anybody is an expert on itimers and SIGPROF in
|
|
||||||
FreeBSD, and would like to debug this, I'd be glad to hear the
|
|
||||||
results!
|
|
||||||
|
|
||||||
libtcmalloc.so successfully builds, and the "advanced" tcmalloc
|
|
||||||
functionality all works except for the leak-checker, which has
|
|
||||||
Linux-specific code:
|
|
||||||
% make heap-profiler_unittest.sh \
|
|
||||||
tcmalloc_unittest tcmalloc_both_unittest \
|
|
||||||
tcmalloc_large_unittest # THESE WORK
|
|
||||||
% make -k heap-checker_unittest.sh \
|
|
||||||
heap-checker-death_unittest.sh # THESE DO NOT
|
|
||||||
|
|
||||||
Note that unless you specify --enable-heap-checker explicitly,
|
|
||||||
'make' will not build the heap-checker unittests on a FreeBSD
|
|
||||||
system.
|
|
||||||
|
|
||||||
I have not tested other *BSD systems, but they are probably similar.
|
|
||||||
|
|
||||||
** Mac OS X:
|
|
||||||
|
|
||||||
I've tested OS X 10.5 [Leopard], OS X 10.4 [Tiger] and OS X 10.3
|
|
||||||
[Panther] on both intel (x86) and PowerPC systems. For Panther
|
|
||||||
systems, perftools does not work at all: it depends on a header
|
|
||||||
file, OSAtomic.h, which is new in 10.4. (It's possible to get the
|
|
||||||
code working for Panther/i386 without too much work; if you're
|
|
||||||
interested in exploring this, drop an e-mail.)
|
|
||||||
|
|
||||||
For the other seven systems, the binaries and libraries that
|
|
||||||
successfully build are exactly the same as for FreeBSD. See that
|
|
||||||
section for a list of binaries and instructions on building them.
|
|
||||||
|
|
||||||
In addition, it appears OS X regularly fails profiler_unittest.sh
|
|
||||||
in the "thread" test (in addition to occassionally failing in the
|
|
||||||
"fork" test). It looks like OS X often delivers the profiling
|
|
||||||
signal to the main thread, even when it's sleeping, rather than
|
|
||||||
spawned threads that are doing actual work. If anyone knows
|
|
||||||
details of how OS X handles SIGPROF (via setitimer()) events with
|
|
||||||
threads, and has insight into this problem, please send mail to
|
|
||||||
google-perftools@googlegroups.com.
|
|
||||||
|
|
||||||
** Solaris 10 x86:
|
|
||||||
|
|
||||||
I've only tested using the GNU C++ compiler, not the Sun C++
|
I've only tested using the GNU C++ compiler, not the Sun C++
|
||||||
compiler. Using g++ requires setting the PATH appropriately when
|
compiler. Using g++ requires setting the PATH appropriately when
|
||||||
@ -306,41 +195,12 @@ above, by linking in libtcmalloc_minimal.
|
|||||||
Work on Windows is rather preliminary: only tcmalloc_minimal is
|
Work on Windows is rather preliminary: only tcmalloc_minimal is
|
||||||
supported.
|
supported.
|
||||||
|
|
||||||
We haven't found a good way to get stack traces in release mode on
|
|
||||||
windows (that is, when FPO is enabled), so the heap profiling may
|
|
||||||
not be reliable in that case. Also, heap-checking and CPU profiling
|
|
||||||
do not yet work at all. But as in other ports, the basic tcmalloc
|
|
||||||
library functionality, overriding malloc and new and such (and even
|
|
||||||
windows-specific functions like _aligned_malloc!), is working fine,
|
|
||||||
at least with VC++ 7.1 (Visual Studio 2003) through VC++ 10.0,
|
|
||||||
in both debug and release modes. See README.windows for
|
|
||||||
instructions on how to install on Windows using Visual Studio.
|
|
||||||
|
|
||||||
Cygwin can compile some but not all of perftools. Furthermore,
|
|
||||||
there is a problem with exception-unwinding in cygwin (it can call
|
|
||||||
malloc, which can call the exception-unwinding-setup code, which
|
|
||||||
can lead to an infinite loop). I've comitted a workaround to the
|
|
||||||
exception unwinding problem, but it only works in debug mode and
|
|
||||||
when statically linking in tcmalloc. I hope to have a more proper
|
|
||||||
fix in a later release. To configure under cygwin, run
|
|
||||||
|
|
||||||
./configure --disable-shared CXXFLAGS=-g && make
|
|
||||||
|
|
||||||
Most of cygwin will compile (cygwin doesn't allow weak symbols, so
|
|
||||||
the heap-checker and a few other pieces of functionality will not
|
|
||||||
compile). 'make' will compile those libraries and tests that can
|
|
||||||
be compiled. You can run 'make check' to make sure the basic
|
|
||||||
functionality is working. I've heard reports that some versions of
|
|
||||||
cygwin fail calls to pthread_join() with EINVAL, causing several
|
|
||||||
tests to fail. If you have any insight into this, please mail
|
|
||||||
google-perftools@googlegroups.com.
|
|
||||||
|
|
||||||
This Windows functionality is also available using MinGW and Msys,
|
This Windows functionality is also available using MinGW and Msys,
|
||||||
In this case, you can use the regular './configure && make'
|
In this case, you can use the regular './configure && make'
|
||||||
process. 'make install' should also work. The Makefile will limit
|
process. 'make install' should also work. The Makefile will limit
|
||||||
itself to those libraries and binaries that work on windows.
|
itself to those libraries and binaries that work on windows.
|
||||||
|
|
||||||
** AIX
|
** AIX (as of 2021)
|
||||||
|
|
||||||
I've tested using the IBM XL and IBM Open XL Compilers. The
|
I've tested using the IBM XL and IBM Open XL Compilers. The
|
||||||
minimum requirement for IBM XL is V16 which includes C++11
|
minimum requirement for IBM XL is V16 which includes C++11
|
||||||
@ -402,6 +262,7 @@ above, by linking in libtcmalloc_minimal.
|
|||||||
the subsystem is replaced it is used for all commands issued from
|
the subsystem is replaced it is used for all commands issued from
|
||||||
the terminal.
|
the terminal.
|
||||||
|
|
||||||
|
|
||||||
Basic Installation
|
Basic Installation
|
||||||
==================
|
==================
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user