refresh INSTALL

This commit is contained in:
Aliaksey Kandratsenka 2023-07-31 15:10:56 -04:00
parent e3de2e3242
commit bef6592746
1 changed files with 77 additions and 216 deletions

293
INSTALL
View File

@ -8,6 +8,9 @@ unlimited permission to copy, distribute and modify it.
Perftools-Specific Install Notes Perftools-Specific Install Notes
================================ ================================
See generic autotool-provided installation notes at the
end. Immediately below you can see gperftools-specific details.
*** Building from source repository *** Building from source repository
As of 2.1 gperftools does not have configure and other autotools As of 2.1 gperftools does not have configure and other autotools
@ -31,68 +34,63 @@ dist (or, preferably, make distcheck) and it'll produce .tar.gz or
build our software without having autotools. build our software without having autotools.
*** NOTE FOR 64-BIT LINUX SYSTEMS *** Stacktrace capturing details
The glibc built-in stack-unwinder on 64-bit systems has some problems A number of gperftools facilities capture stack traces. And
with the perftools libraries. (In particular, the cpu/heap profiler occasionally this happens in 'tricky' locations, like in SIGPROF
may be in the middle of malloc, holding some malloc-related locks when handler. So some platforms and library versions occasionally cause
they invoke the stack unwinder. The built-in stack unwinder may call troubles (crashes or hangs, or truncated stack traces).
malloc recursively, which may require the thread to acquire a lock it
already holds: deadlock.)
For that reason, if you use a 64-bit system, we strongly recommend you So we do provide several implementations that our users are able to
install libunwind before trying to configure or install gperftools. select at runtime. Pass TCMALLOC_STACKTRACE_METHOD_VERBOSE=t as
libunwind can be found at environment variable to ./stacktrace_unittest to see options.
http://download.savannah.gnu.org/releases/libunwind/libunwind-0.99-beta.tar.gz * frame-pointer-based stacktracing is fully supported on x86 (all 3
kinds: i386, x32 and x86-64 are suppored), aarch64 and riscv. But
all modern architectures and ABIs by default build code without
frame pointers (even on i386). So in order to get anything useful
out of this option, you need to build your code with frame
pointers. It adds some performance overhead (usually people quote
order of 2%-3%, but it can really vary based on workloads). Also it
is worth mentioning, that it is fairly common for various asm
routines not to have frame pointers, so you'll have somewhat
imperfect profiles out of typical asm bits like memcpy. This stack
trace capuring method is also fastest (like 2-3 orders of magnitude
faster), which will matter when stacktrace capturing is done a lot
(e.g. heap profiler).
Even if you already have libunwind installed, you should check the * libgcc-based stacktracing works particularly great on modern
version. Versions older than this will not work properly; too-new GNU/Linux systems with glibc 2.34 or later and libgcc from gcc 12 or
versions introduce new code that does not work well with perftools later. Thanks to usage of dl_find_object API introduced in recent
(because libunwind can call malloc, which will lead to deadlock). glibc-s this implementation seems to be truly async-signal safe and
it is reasonably fast too. On Linux and other ELF platforms it uses
eh_frame facility (which is very similar to dwarf unwind info). It
was originally introduced for exception handling. On most modern
platforms this unwind info is automatically added by compilers. On
others you might need to add -fexceptions and/or
-fasynchrnous-unwind-tables to your compiler flags. To make this
option default, pass --enable-libgcc-unwinder-by-default to
configure. When used without dl_find_object it will occasionally
deadlock especially when used in cpuprofiler.
There have been reports of crashes with libunwind 0.99 (see * libunwind is another supported mechanism and is default when
http://code.google.com/p/gperftools/issues/detail?id=374). available. It also depends on eh_frame stuff (or dwarf or some
Alternately, you can use a more recent libunwind (e.g. 1.0.1) at the arm-specific thingy when available). When using it, be sure to use
cost of adding a bit of boilerplate to your code. For details, see latest available libunwind version. As with libgcc some people
http://groups.google.com/group/google-perftools/msg/2686d9f24ac4365f occasionally had trouble with it on codes with broken or missing
unwind info. If you encounter something like that, first make sure
to file tickets against your compiler vender. Second, libunwind has
configure option to check accesses more thoroughly, so consider
that.
CAUTION: if you install libunwind from the url above, be aware that * many systems provide backtrace() function either as part of their
you may have trouble if you try to statically link your binary with libc or in -lexecinfo. On most systems, including GNU/Linux, it is
perftools: that is, if you link with 'gcc -static -lgcc_eh ...'. not built by default, so pass --enable-stacktrace-via-backtrace to
This is because both libunwind and libgcc implement the same C++ configure to enable it. Occasionally this implementation will call
exception handling APIs, but they implement them differently on malloc when capturing backtrace, but we should automagically handle
some platforms. This is not likely to be a problem on ia64, but it via our "emergency malloc" facility which is now built by default
may be on x86-64. on most systems (but it currently doesn't handle being used by
cpuprofiler).
Also, if you link binaries statically, make sure that you add
-Wl,--eh-frame-hdr to your linker options. This is required so that
libunwind can find the information generated by the compiler
required for stack unwinding.
Using -static is rare, though, so unless you know this will affect
you it probably won't.
If you cannot or do not wish to install libunwind, you can still try
to use the built-in stack unwinder. The built-in stack unwinder
requires that your application, the tcmalloc library, and system
libraries like libc, all be compiled with a frame pointer. This is
*not* the default for x86-64.
If you are on x86-64 system, know that you have a set of system
libraries with frame-pointers enabled, and compile all your
applications with -fno-omit-frame-pointer, then you can enable the
built-in perftools stack unwinder by passing the
--enable-frame-pointers flag to configure.
Even with the use of libunwind, there are still known problems with
stack unwinding on 64-bit systems, particularly x86-64. See the
"64-BIT ISSUES" section in README.
If you encounter problems, try compiling perftools with './configure
--enable-frame-pointers'. Note you will need to compile your
application with frame pointers (via 'gcc -fno-omit-frame-pointer
...') in this case.
*** TCMALLOC LARGE PAGES: TRADING TIME FOR SPACE *** TCMALLOC LARGE PAGES: TRADING TIME FOR SPACE
@ -138,20 +136,6 @@ flag yet. To build libtcmalloc with smaller internal caches, run
(or add -DTCMALLOC_SMALL_BUT_SLOW to your existing CXXFLAGS argument). (or add -DTCMALLOC_SMALL_BUT_SLOW to your existing CXXFLAGS argument).
*** NOTE FOR ___tls_get_addr ERROR
When compiling perftools on some old systems, like RedHat 8, you may
get an error like this:
___tls_get_addr: symbol not found
This means that you have a system where some parts are updated enough
to support Thread Local Storage, but others are not. The perftools
configure script can't always detect this kind of case, leading to
that error. To fix it, just comment out the line
#define HAVE_TLS 1
in your config.h file before building.
*** TCMALLOC AND DLOPEN *** TCMALLOC AND DLOPEN
To improve performance, we use the "initial exec" model of Thread To improve performance, we use the "initial exec" model of Thread
@ -159,132 +143,37 @@ Local Storage in tcmalloc. The price for this is the library will not
work correctly if it is loaded via dlopen(). This should not be a work correctly if it is loaded via dlopen(). This should not be a
problem, since loading a malloc-replacement library via dlopen is problem, since loading a malloc-replacement library via dlopen is
asking for trouble in any case: some data will be allocated with one asking for trouble in any case: some data will be allocated with one
malloc, some with another. If, for some reason, you *do* need to use malloc, some with another.
dlopen on tcmalloc, the easiest way is to use a version of tcmalloc
with TLS turned off; see the ___tls_get_addr note above.
*** COMPILING ON NON-LINUX SYSTEMS *** COMPILING ON NON-LINUX SYSTEMS
Perftools has been tested on the following systems: We regularly build and test on typical modern GNU/Linux systems. You
FreeBSD 6.0 (x86) should expect all tests to pass on modern Linux distros and x86,
FreeBSD 8.1 (x86_64) aarch64 and riscv machines. Other machine types may fail some tests,
Linux CentOS 5.5 (x86_64) but you should expect at least malloc to be fully functional.
Linux Debian 4.0 (PPC)
Linux Debian 5.0 (x86)
Linux Fedora Core 3 (x86)
Linux Fedora Core 4 (x86)
Linux Fedora Core 5 (x86)
Linux Fedora Core 6 (x86)
Linux Fedora Core 13 (x86_64)
Linux Fedora Core 14 (x86_64)
Linux RedHat 9 (x86)
Linux Slackware 13 (x86_64)
Linux Ubuntu 6.06.1 (x86)
Linux Ubuntu 6.06.1 (x86_64)
Linux Ubuntu 10.04 (x86)
Linux Ubuntu 10.10 (x86_64)
Mac OS X 10.3.9 (Panther) (PowerPC)
Mac OS X 10.4.8 (Tiger) (PowerPC)
Mac OS X 10.4.8 (Tiger) (x86)
Mac OS X 10.5 (Leopard) (x86)
Mac OS X 10.6 (Snow Leopard) (x86)
Solaris 10 (x86_64)
Windows XP, Visual Studio 2003 (VC++ 7.1) (x86)
Windows XP, Visual Studio 2005 (VC++ 8) (x86)
Windows XP, Visual Studio 2005 (VC++ 9) (x86)
Windows XP, Visual Studio 2005 (VC++ 10) (x86)
Windows XP, MinGW 5.1.3 (x86)
Windows XP, Cygwin 5.1 (x86)
It works in its full generality on the Linux systems Perftools has been tested on the following non-Linux systems:
tested (though see 64-bit notes above). Portions of perftools work on Various recent versions of FreeBSD (x86-64 mostly)
the other systems. The basic memory-allocation library, Recent version of NetBSD (x86-64)
tcmalloc_minimal, works on all systems. The cpu-profiler also works Recent versions of OSX (aarch64, x86 and ppc hasn't been tested for some time)
fairly widely. However, the heap-profiler and heap-checker are not Solaris 10 (x86_64), but not recently
yet as widely supported. In general, the 'configure' script will Windows using both MSVC (starting from MSVC 2015 and later) and mingw toolchains
detect what OS you are building for, and only build the components Windows XP and other obsolete versions have not been tested recently
that work on that OS. Windows XP, Cygwin 5.1 (x86), but not recently
Portions of gperftools work on those other systems. The basic
memory-allocation library, tcmalloc_minimal, works on all systems.
The cpu-profiler also works fairly widely. However, the heap-profiler
and heap-checker are not yet as widely supported. Heap checker is now
deprecated. In general, the 'configure' script will detect what OS you
are building for, and only build the components that work on that OS.
Note that tcmalloc_minimal is perfectly usable as a malloc/new Note that tcmalloc_minimal is perfectly usable as a malloc/new
replacement, so it is possible to use tcmalloc on all the systems replacement, so it is possible to use tcmalloc on all the systems
above, by linking in libtcmalloc_minimal. above, by linking in libtcmalloc_minimal.
** FreeBSD: ** Solaris 10 x86: (note, this is fairly old)
The following binaries build and run successfully (creating
libtcmalloc_minimal.so and libprofile.so in the process):
% ./configure
% make tcmalloc_minimal_unittest tcmalloc_minimal_large_unittest \
addressmap_unittest atomicops_unittest frag_unittest \
low_level_alloc_unittest markidle_unittest memalign_unittest \
packed_cache_test stacktrace_unittest system_alloc_unittest \
thread_dealloc_unittest profiler_unittest.sh
% ./tcmalloc_minimal_unittest # to run this test
% [etc] # to run other tests
Three caveats: first, frag_unittest tries to allocate 400M of memory,
and if you have less virtual memory on your system, the test may
fail with a bad_alloc exception.
Second, profiler_unittest.sh sometimes fails in the "fork" test.
This is because stray SIGPROF signals from the parent process are
making their way into the child process. (This may be a kernel
bug that only exists in older kernels.) The profiling code itself
is working fine. This only affects programs that call fork(); for
most programs, the cpu profiler is entirely safe to use.
Third, perftools depends on /proc to get shared library
information. If you are running a FreeBSD system without proc,
perftools will not be able to map addresses to functions. Some
unittests will fail as a result.
Finally, the new test introduced in perftools-1.2,
profile_handler_unittest, fails on FreeBSD. It has something to do
with how the itimer works. The cpu profiler test passes, so I
believe the functionality is correct and the issue is with the test
somehow. If anybody is an expert on itimers and SIGPROF in
FreeBSD, and would like to debug this, I'd be glad to hear the
results!
libtcmalloc.so successfully builds, and the "advanced" tcmalloc
functionality all works except for the leak-checker, which has
Linux-specific code:
% make heap-profiler_unittest.sh \
tcmalloc_unittest tcmalloc_both_unittest \
tcmalloc_large_unittest # THESE WORK
% make -k heap-checker_unittest.sh \
heap-checker-death_unittest.sh # THESE DO NOT
Note that unless you specify --enable-heap-checker explicitly,
'make' will not build the heap-checker unittests on a FreeBSD
system.
I have not tested other *BSD systems, but they are probably similar.
** Mac OS X:
I've tested OS X 10.5 [Leopard], OS X 10.4 [Tiger] and OS X 10.3
[Panther] on both intel (x86) and PowerPC systems. For Panther
systems, perftools does not work at all: it depends on a header
file, OSAtomic.h, which is new in 10.4. (It's possible to get the
code working for Panther/i386 without too much work; if you're
interested in exploring this, drop an e-mail.)
For the other seven systems, the binaries and libraries that
successfully build are exactly the same as for FreeBSD. See that
section for a list of binaries and instructions on building them.
In addition, it appears OS X regularly fails profiler_unittest.sh
in the "thread" test (in addition to occassionally failing in the
"fork" test). It looks like OS X often delivers the profiling
signal to the main thread, even when it's sleeping, rather than
spawned threads that are doing actual work. If anyone knows
details of how OS X handles SIGPROF (via setitimer()) events with
threads, and has insight into this problem, please send mail to
google-perftools@googlegroups.com.
** Solaris 10 x86:
I've only tested using the GNU C++ compiler, not the Sun C++ I've only tested using the GNU C++ compiler, not the Sun C++
compiler. Using g++ requires setting the PATH appropriately when compiler. Using g++ requires setting the PATH appropriately when
@ -306,41 +195,12 @@ above, by linking in libtcmalloc_minimal.
Work on Windows is rather preliminary: only tcmalloc_minimal is Work on Windows is rather preliminary: only tcmalloc_minimal is
supported. supported.
We haven't found a good way to get stack traces in release mode on
windows (that is, when FPO is enabled), so the heap profiling may
not be reliable in that case. Also, heap-checking and CPU profiling
do not yet work at all. But as in other ports, the basic tcmalloc
library functionality, overriding malloc and new and such (and even
windows-specific functions like _aligned_malloc!), is working fine,
at least with VC++ 7.1 (Visual Studio 2003) through VC++ 10.0,
in both debug and release modes. See README.windows for
instructions on how to install on Windows using Visual Studio.
Cygwin can compile some but not all of perftools. Furthermore,
there is a problem with exception-unwinding in cygwin (it can call
malloc, which can call the exception-unwinding-setup code, which
can lead to an infinite loop). I've comitted a workaround to the
exception unwinding problem, but it only works in debug mode and
when statically linking in tcmalloc. I hope to have a more proper
fix in a later release. To configure under cygwin, run
./configure --disable-shared CXXFLAGS=-g && make
Most of cygwin will compile (cygwin doesn't allow weak symbols, so
the heap-checker and a few other pieces of functionality will not
compile). 'make' will compile those libraries and tests that can
be compiled. You can run 'make check' to make sure the basic
functionality is working. I've heard reports that some versions of
cygwin fail calls to pthread_join() with EINVAL, causing several
tests to fail. If you have any insight into this, please mail
google-perftools@googlegroups.com.
This Windows functionality is also available using MinGW and Msys, This Windows functionality is also available using MinGW and Msys,
In this case, you can use the regular './configure && make' In this case, you can use the regular './configure && make'
process. 'make install' should also work. The Makefile will limit process. 'make install' should also work. The Makefile will limit
itself to those libraries and binaries that work on windows. itself to those libraries and binaries that work on windows.
** AIX ** AIX (as of 2021)
I've tested using the IBM XL and IBM Open XL Compilers. The I've tested using the IBM XL and IBM Open XL Compilers. The
minimum requirement for IBM XL is V16 which includes C++11 minimum requirement for IBM XL is V16 which includes C++11
@ -402,6 +262,7 @@ above, by linking in libtcmalloc_minimal.
the subsystem is replaced it is used for all commands issued from the subsystem is replaced it is used for all commands issued from
the terminal. the terminal.
Basic Installation Basic Installation
================== ==================