refresh INSTALL
This commit is contained in:
parent
e3de2e3242
commit
bef6592746
293
INSTALL
293
INSTALL
|
@ -8,6 +8,9 @@ unlimited permission to copy, distribute and modify it.
|
|||
Perftools-Specific Install Notes
|
||||
================================
|
||||
|
||||
See generic autotool-provided installation notes at the
|
||||
end. Immediately below you can see gperftools-specific details.
|
||||
|
||||
*** Building from source repository
|
||||
|
||||
As of 2.1 gperftools does not have configure and other autotools
|
||||
|
@ -31,68 +34,63 @@ dist (or, preferably, make distcheck) and it'll produce .tar.gz or
|
|||
build our software without having autotools.
|
||||
|
||||
|
||||
*** NOTE FOR 64-BIT LINUX SYSTEMS
|
||||
*** Stacktrace capturing details
|
||||
|
||||
The glibc built-in stack-unwinder on 64-bit systems has some problems
|
||||
with the perftools libraries. (In particular, the cpu/heap profiler
|
||||
may be in the middle of malloc, holding some malloc-related locks when
|
||||
they invoke the stack unwinder. The built-in stack unwinder may call
|
||||
malloc recursively, which may require the thread to acquire a lock it
|
||||
already holds: deadlock.)
|
||||
A number of gperftools facilities capture stack traces. And
|
||||
occasionally this happens in 'tricky' locations, like in SIGPROF
|
||||
handler. So some platforms and library versions occasionally cause
|
||||
troubles (crashes or hangs, or truncated stack traces).
|
||||
|
||||
For that reason, if you use a 64-bit system, we strongly recommend you
|
||||
install libunwind before trying to configure or install gperftools.
|
||||
libunwind can be found at
|
||||
So we do provide several implementations that our users are able to
|
||||
select at runtime. Pass TCMALLOC_STACKTRACE_METHOD_VERBOSE=t as
|
||||
environment variable to ./stacktrace_unittest to see options.
|
||||
|
||||
http://download.savannah.gnu.org/releases/libunwind/libunwind-0.99-beta.tar.gz
|
||||
* frame-pointer-based stacktracing is fully supported on x86 (all 3
|
||||
kinds: i386, x32 and x86-64 are suppored), aarch64 and riscv. But
|
||||
all modern architectures and ABIs by default build code without
|
||||
frame pointers (even on i386). So in order to get anything useful
|
||||
out of this option, you need to build your code with frame
|
||||
pointers. It adds some performance overhead (usually people quote
|
||||
order of 2%-3%, but it can really vary based on workloads). Also it
|
||||
is worth mentioning, that it is fairly common for various asm
|
||||
routines not to have frame pointers, so you'll have somewhat
|
||||
imperfect profiles out of typical asm bits like memcpy. This stack
|
||||
trace capuring method is also fastest (like 2-3 orders of magnitude
|
||||
faster), which will matter when stacktrace capturing is done a lot
|
||||
(e.g. heap profiler).
|
||||
|
||||
Even if you already have libunwind installed, you should check the
|
||||
version. Versions older than this will not work properly; too-new
|
||||
versions introduce new code that does not work well with perftools
|
||||
(because libunwind can call malloc, which will lead to deadlock).
|
||||
* libgcc-based stacktracing works particularly great on modern
|
||||
GNU/Linux systems with glibc 2.34 or later and libgcc from gcc 12 or
|
||||
later. Thanks to usage of dl_find_object API introduced in recent
|
||||
glibc-s this implementation seems to be truly async-signal safe and
|
||||
it is reasonably fast too. On Linux and other ELF platforms it uses
|
||||
eh_frame facility (which is very similar to dwarf unwind info). It
|
||||
was originally introduced for exception handling. On most modern
|
||||
platforms this unwind info is automatically added by compilers. On
|
||||
others you might need to add -fexceptions and/or
|
||||
-fasynchrnous-unwind-tables to your compiler flags. To make this
|
||||
option default, pass --enable-libgcc-unwinder-by-default to
|
||||
configure. When used without dl_find_object it will occasionally
|
||||
deadlock especially when used in cpuprofiler.
|
||||
|
||||
There have been reports of crashes with libunwind 0.99 (see
|
||||
http://code.google.com/p/gperftools/issues/detail?id=374).
|
||||
Alternately, you can use a more recent libunwind (e.g. 1.0.1) at the
|
||||
cost of adding a bit of boilerplate to your code. For details, see
|
||||
http://groups.google.com/group/google-perftools/msg/2686d9f24ac4365f
|
||||
* libunwind is another supported mechanism and is default when
|
||||
available. It also depends on eh_frame stuff (or dwarf or some
|
||||
arm-specific thingy when available). When using it, be sure to use
|
||||
latest available libunwind version. As with libgcc some people
|
||||
occasionally had trouble with it on codes with broken or missing
|
||||
unwind info. If you encounter something like that, first make sure
|
||||
to file tickets against your compiler vender. Second, libunwind has
|
||||
configure option to check accesses more thoroughly, so consider
|
||||
that.
|
||||
|
||||
CAUTION: if you install libunwind from the url above, be aware that
|
||||
you may have trouble if you try to statically link your binary with
|
||||
perftools: that is, if you link with 'gcc -static -lgcc_eh ...'.
|
||||
This is because both libunwind and libgcc implement the same C++
|
||||
exception handling APIs, but they implement them differently on
|
||||
some platforms. This is not likely to be a problem on ia64, but
|
||||
may be on x86-64.
|
||||
|
||||
Also, if you link binaries statically, make sure that you add
|
||||
-Wl,--eh-frame-hdr to your linker options. This is required so that
|
||||
libunwind can find the information generated by the compiler
|
||||
required for stack unwinding.
|
||||
|
||||
Using -static is rare, though, so unless you know this will affect
|
||||
you it probably won't.
|
||||
|
||||
If you cannot or do not wish to install libunwind, you can still try
|
||||
to use the built-in stack unwinder. The built-in stack unwinder
|
||||
requires that your application, the tcmalloc library, and system
|
||||
libraries like libc, all be compiled with a frame pointer. This is
|
||||
*not* the default for x86-64.
|
||||
|
||||
If you are on x86-64 system, know that you have a set of system
|
||||
libraries with frame-pointers enabled, and compile all your
|
||||
applications with -fno-omit-frame-pointer, then you can enable the
|
||||
built-in perftools stack unwinder by passing the
|
||||
--enable-frame-pointers flag to configure.
|
||||
|
||||
Even with the use of libunwind, there are still known problems with
|
||||
stack unwinding on 64-bit systems, particularly x86-64. See the
|
||||
"64-BIT ISSUES" section in README.
|
||||
|
||||
If you encounter problems, try compiling perftools with './configure
|
||||
--enable-frame-pointers'. Note you will need to compile your
|
||||
application with frame pointers (via 'gcc -fno-omit-frame-pointer
|
||||
...') in this case.
|
||||
* many systems provide backtrace() function either as part of their
|
||||
libc or in -lexecinfo. On most systems, including GNU/Linux, it is
|
||||
not built by default, so pass --enable-stacktrace-via-backtrace to
|
||||
configure to enable it. Occasionally this implementation will call
|
||||
malloc when capturing backtrace, but we should automagically handle
|
||||
it via our "emergency malloc" facility which is now built by default
|
||||
on most systems (but it currently doesn't handle being used by
|
||||
cpuprofiler).
|
||||
|
||||
|
||||
*** TCMALLOC LARGE PAGES: TRADING TIME FOR SPACE
|
||||
|
@ -138,20 +136,6 @@ flag yet. To build libtcmalloc with smaller internal caches, run
|
|||
(or add -DTCMALLOC_SMALL_BUT_SLOW to your existing CXXFLAGS argument).
|
||||
|
||||
|
||||
*** NOTE FOR ___tls_get_addr ERROR
|
||||
|
||||
When compiling perftools on some old systems, like RedHat 8, you may
|
||||
get an error like this:
|
||||
___tls_get_addr: symbol not found
|
||||
|
||||
This means that you have a system where some parts are updated enough
|
||||
to support Thread Local Storage, but others are not. The perftools
|
||||
configure script can't always detect this kind of case, leading to
|
||||
that error. To fix it, just comment out the line
|
||||
#define HAVE_TLS 1
|
||||
in your config.h file before building.
|
||||
|
||||
|
||||
*** TCMALLOC AND DLOPEN
|
||||
|
||||
To improve performance, we use the "initial exec" model of Thread
|
||||
|
@ -159,132 +143,37 @@ Local Storage in tcmalloc. The price for this is the library will not
|
|||
work correctly if it is loaded via dlopen(). This should not be a
|
||||
problem, since loading a malloc-replacement library via dlopen is
|
||||
asking for trouble in any case: some data will be allocated with one
|
||||
malloc, some with another. If, for some reason, you *do* need to use
|
||||
dlopen on tcmalloc, the easiest way is to use a version of tcmalloc
|
||||
with TLS turned off; see the ___tls_get_addr note above.
|
||||
malloc, some with another.
|
||||
|
||||
|
||||
*** COMPILING ON NON-LINUX SYSTEMS
|
||||
|
||||
Perftools has been tested on the following systems:
|
||||
FreeBSD 6.0 (x86)
|
||||
FreeBSD 8.1 (x86_64)
|
||||
Linux CentOS 5.5 (x86_64)
|
||||
Linux Debian 4.0 (PPC)
|
||||
Linux Debian 5.0 (x86)
|
||||
Linux Fedora Core 3 (x86)
|
||||
Linux Fedora Core 4 (x86)
|
||||
Linux Fedora Core 5 (x86)
|
||||
Linux Fedora Core 6 (x86)
|
||||
Linux Fedora Core 13 (x86_64)
|
||||
Linux Fedora Core 14 (x86_64)
|
||||
Linux RedHat 9 (x86)
|
||||
Linux Slackware 13 (x86_64)
|
||||
Linux Ubuntu 6.06.1 (x86)
|
||||
Linux Ubuntu 6.06.1 (x86_64)
|
||||
Linux Ubuntu 10.04 (x86)
|
||||
Linux Ubuntu 10.10 (x86_64)
|
||||
Mac OS X 10.3.9 (Panther) (PowerPC)
|
||||
Mac OS X 10.4.8 (Tiger) (PowerPC)
|
||||
Mac OS X 10.4.8 (Tiger) (x86)
|
||||
Mac OS X 10.5 (Leopard) (x86)
|
||||
Mac OS X 10.6 (Snow Leopard) (x86)
|
||||
Solaris 10 (x86_64)
|
||||
Windows XP, Visual Studio 2003 (VC++ 7.1) (x86)
|
||||
Windows XP, Visual Studio 2005 (VC++ 8) (x86)
|
||||
Windows XP, Visual Studio 2005 (VC++ 9) (x86)
|
||||
Windows XP, Visual Studio 2005 (VC++ 10) (x86)
|
||||
Windows XP, MinGW 5.1.3 (x86)
|
||||
Windows XP, Cygwin 5.1 (x86)
|
||||
We regularly build and test on typical modern GNU/Linux systems. You
|
||||
should expect all tests to pass on modern Linux distros and x86,
|
||||
aarch64 and riscv machines. Other machine types may fail some tests,
|
||||
but you should expect at least malloc to be fully functional.
|
||||
|
||||
It works in its full generality on the Linux systems
|
||||
tested (though see 64-bit notes above). Portions of perftools work on
|
||||
the other systems. The basic memory-allocation library,
|
||||
tcmalloc_minimal, works on all systems. The cpu-profiler also works
|
||||
fairly widely. However, the heap-profiler and heap-checker are not
|
||||
yet as widely supported. In general, the 'configure' script will
|
||||
detect what OS you are building for, and only build the components
|
||||
that work on that OS.
|
||||
Perftools has been tested on the following non-Linux systems:
|
||||
Various recent versions of FreeBSD (x86-64 mostly)
|
||||
Recent version of NetBSD (x86-64)
|
||||
Recent versions of OSX (aarch64, x86 and ppc hasn't been tested for some time)
|
||||
Solaris 10 (x86_64), but not recently
|
||||
Windows using both MSVC (starting from MSVC 2015 and later) and mingw toolchains
|
||||
Windows XP and other obsolete versions have not been tested recently
|
||||
Windows XP, Cygwin 5.1 (x86), but not recently
|
||||
|
||||
Portions of gperftools work on those other systems. The basic
|
||||
memory-allocation library, tcmalloc_minimal, works on all systems.
|
||||
The cpu-profiler also works fairly widely. However, the heap-profiler
|
||||
and heap-checker are not yet as widely supported. Heap checker is now
|
||||
deprecated. In general, the 'configure' script will detect what OS you
|
||||
are building for, and only build the components that work on that OS.
|
||||
|
||||
Note that tcmalloc_minimal is perfectly usable as a malloc/new
|
||||
replacement, so it is possible to use tcmalloc on all the systems
|
||||
above, by linking in libtcmalloc_minimal.
|
||||
|
||||
** FreeBSD:
|
||||
|
||||
The following binaries build and run successfully (creating
|
||||
libtcmalloc_minimal.so and libprofile.so in the process):
|
||||
% ./configure
|
||||
% make tcmalloc_minimal_unittest tcmalloc_minimal_large_unittest \
|
||||
addressmap_unittest atomicops_unittest frag_unittest \
|
||||
low_level_alloc_unittest markidle_unittest memalign_unittest \
|
||||
packed_cache_test stacktrace_unittest system_alloc_unittest \
|
||||
thread_dealloc_unittest profiler_unittest.sh
|
||||
% ./tcmalloc_minimal_unittest # to run this test
|
||||
% [etc] # to run other tests
|
||||
|
||||
Three caveats: first, frag_unittest tries to allocate 400M of memory,
|
||||
and if you have less virtual memory on your system, the test may
|
||||
fail with a bad_alloc exception.
|
||||
|
||||
Second, profiler_unittest.sh sometimes fails in the "fork" test.
|
||||
This is because stray SIGPROF signals from the parent process are
|
||||
making their way into the child process. (This may be a kernel
|
||||
bug that only exists in older kernels.) The profiling code itself
|
||||
is working fine. This only affects programs that call fork(); for
|
||||
most programs, the cpu profiler is entirely safe to use.
|
||||
|
||||
Third, perftools depends on /proc to get shared library
|
||||
information. If you are running a FreeBSD system without proc,
|
||||
perftools will not be able to map addresses to functions. Some
|
||||
unittests will fail as a result.
|
||||
|
||||
Finally, the new test introduced in perftools-1.2,
|
||||
profile_handler_unittest, fails on FreeBSD. It has something to do
|
||||
with how the itimer works. The cpu profiler test passes, so I
|
||||
believe the functionality is correct and the issue is with the test
|
||||
somehow. If anybody is an expert on itimers and SIGPROF in
|
||||
FreeBSD, and would like to debug this, I'd be glad to hear the
|
||||
results!
|
||||
|
||||
libtcmalloc.so successfully builds, and the "advanced" tcmalloc
|
||||
functionality all works except for the leak-checker, which has
|
||||
Linux-specific code:
|
||||
% make heap-profiler_unittest.sh \
|
||||
tcmalloc_unittest tcmalloc_both_unittest \
|
||||
tcmalloc_large_unittest # THESE WORK
|
||||
% make -k heap-checker_unittest.sh \
|
||||
heap-checker-death_unittest.sh # THESE DO NOT
|
||||
|
||||
Note that unless you specify --enable-heap-checker explicitly,
|
||||
'make' will not build the heap-checker unittests on a FreeBSD
|
||||
system.
|
||||
|
||||
I have not tested other *BSD systems, but they are probably similar.
|
||||
|
||||
** Mac OS X:
|
||||
|
||||
I've tested OS X 10.5 [Leopard], OS X 10.4 [Tiger] and OS X 10.3
|
||||
[Panther] on both intel (x86) and PowerPC systems. For Panther
|
||||
systems, perftools does not work at all: it depends on a header
|
||||
file, OSAtomic.h, which is new in 10.4. (It's possible to get the
|
||||
code working for Panther/i386 without too much work; if you're
|
||||
interested in exploring this, drop an e-mail.)
|
||||
|
||||
For the other seven systems, the binaries and libraries that
|
||||
successfully build are exactly the same as for FreeBSD. See that
|
||||
section for a list of binaries and instructions on building them.
|
||||
|
||||
In addition, it appears OS X regularly fails profiler_unittest.sh
|
||||
in the "thread" test (in addition to occassionally failing in the
|
||||
"fork" test). It looks like OS X often delivers the profiling
|
||||
signal to the main thread, even when it's sleeping, rather than
|
||||
spawned threads that are doing actual work. If anyone knows
|
||||
details of how OS X handles SIGPROF (via setitimer()) events with
|
||||
threads, and has insight into this problem, please send mail to
|
||||
google-perftools@googlegroups.com.
|
||||
|
||||
** Solaris 10 x86:
|
||||
** Solaris 10 x86: (note, this is fairly old)
|
||||
|
||||
I've only tested using the GNU C++ compiler, not the Sun C++
|
||||
compiler. Using g++ requires setting the PATH appropriately when
|
||||
|
@ -306,41 +195,12 @@ above, by linking in libtcmalloc_minimal.
|
|||
Work on Windows is rather preliminary: only tcmalloc_minimal is
|
||||
supported.
|
||||
|
||||
We haven't found a good way to get stack traces in release mode on
|
||||
windows (that is, when FPO is enabled), so the heap profiling may
|
||||
not be reliable in that case. Also, heap-checking and CPU profiling
|
||||
do not yet work at all. But as in other ports, the basic tcmalloc
|
||||
library functionality, overriding malloc and new and such (and even
|
||||
windows-specific functions like _aligned_malloc!), is working fine,
|
||||
at least with VC++ 7.1 (Visual Studio 2003) through VC++ 10.0,
|
||||
in both debug and release modes. See README.windows for
|
||||
instructions on how to install on Windows using Visual Studio.
|
||||
|
||||
Cygwin can compile some but not all of perftools. Furthermore,
|
||||
there is a problem with exception-unwinding in cygwin (it can call
|
||||
malloc, which can call the exception-unwinding-setup code, which
|
||||
can lead to an infinite loop). I've comitted a workaround to the
|
||||
exception unwinding problem, but it only works in debug mode and
|
||||
when statically linking in tcmalloc. I hope to have a more proper
|
||||
fix in a later release. To configure under cygwin, run
|
||||
|
||||
./configure --disable-shared CXXFLAGS=-g && make
|
||||
|
||||
Most of cygwin will compile (cygwin doesn't allow weak symbols, so
|
||||
the heap-checker and a few other pieces of functionality will not
|
||||
compile). 'make' will compile those libraries and tests that can
|
||||
be compiled. You can run 'make check' to make sure the basic
|
||||
functionality is working. I've heard reports that some versions of
|
||||
cygwin fail calls to pthread_join() with EINVAL, causing several
|
||||
tests to fail. If you have any insight into this, please mail
|
||||
google-perftools@googlegroups.com.
|
||||
|
||||
This Windows functionality is also available using MinGW and Msys,
|
||||
In this case, you can use the regular './configure && make'
|
||||
process. 'make install' should also work. The Makefile will limit
|
||||
itself to those libraries and binaries that work on windows.
|
||||
|
||||
** AIX
|
||||
** AIX (as of 2021)
|
||||
|
||||
I've tested using the IBM XL and IBM Open XL Compilers. The
|
||||
minimum requirement for IBM XL is V16 which includes C++11
|
||||
|
@ -402,6 +262,7 @@ above, by linking in libtcmalloc_minimal.
|
|||
the subsystem is replaced it is used for all commands issued from
|
||||
the terminal.
|
||||
|
||||
|
||||
Basic Installation
|
||||
==================
|
||||
|
||||
|
|
Loading…
Reference in New Issue