diff --git a/INSTALL b/INSTALL index 808c97a..71c3f05 100644 --- a/INSTALL +++ b/INSTALL @@ -8,6 +8,9 @@ unlimited permission to copy, distribute and modify it. Perftools-Specific Install Notes ================================ +See generic autotool-provided installation notes at the +end. Immediately below you can see gperftools-specific details. + *** Building from source repository As of 2.1 gperftools does not have configure and other autotools @@ -31,68 +34,63 @@ dist (or, preferably, make distcheck) and it'll produce .tar.gz or build our software without having autotools. -*** NOTE FOR 64-BIT LINUX SYSTEMS +*** Stacktrace capturing details -The glibc built-in stack-unwinder on 64-bit systems has some problems -with the perftools libraries. (In particular, the cpu/heap profiler -may be in the middle of malloc, holding some malloc-related locks when -they invoke the stack unwinder. The built-in stack unwinder may call -malloc recursively, which may require the thread to acquire a lock it -already holds: deadlock.) +A number of gperftools facilities capture stack traces. And +occasionally this happens in 'tricky' locations, like in SIGPROF +handler. So some platforms and library versions occasionally cause +troubles (crashes or hangs, or truncated stack traces). -For that reason, if you use a 64-bit system, we strongly recommend you -install libunwind before trying to configure or install gperftools. -libunwind can be found at +So we do provide several implementations that our users are able to +select at runtime. Pass TCMALLOC_STACKTRACE_METHOD_VERBOSE=t as +environment variable to ./stacktrace_unittest to see options. - http://download.savannah.gnu.org/releases/libunwind/libunwind-0.99-beta.tar.gz +* frame-pointer-based stacktracing is fully supported on x86 (all 3 + kinds: i386, x32 and x86-64 are suppored), aarch64 and riscv. But + all modern architectures and ABIs by default build code without + frame pointers (even on i386). So in order to get anything useful + out of this option, you need to build your code with frame + pointers. It adds some performance overhead (usually people quote + order of 2%-3%, but it can really vary based on workloads). Also it + is worth mentioning, that it is fairly common for various asm + routines not to have frame pointers, so you'll have somewhat + imperfect profiles out of typical asm bits like memcpy. This stack + trace capuring method is also fastest (like 2-3 orders of magnitude + faster), which will matter when stacktrace capturing is done a lot + (e.g. heap profiler). -Even if you already have libunwind installed, you should check the -version. Versions older than this will not work properly; too-new -versions introduce new code that does not work well with perftools -(because libunwind can call malloc, which will lead to deadlock). +* libgcc-based stacktracing works particularly great on modern + GNU/Linux systems with glibc 2.34 or later and libgcc from gcc 12 or + later. Thanks to usage of dl_find_object API introduced in recent + glibc-s this implementation seems to be truly async-signal safe and + it is reasonably fast too. On Linux and other ELF platforms it uses + eh_frame facility (which is very similar to dwarf unwind info). It + was originally introduced for exception handling. On most modern + platforms this unwind info is automatically added by compilers. On + others you might need to add -fexceptions and/or + -fasynchrnous-unwind-tables to your compiler flags. To make this + option default, pass --enable-libgcc-unwinder-by-default to + configure. When used without dl_find_object it will occasionally + deadlock especially when used in cpuprofiler. -There have been reports of crashes with libunwind 0.99 (see -http://code.google.com/p/gperftools/issues/detail?id=374). -Alternately, you can use a more recent libunwind (e.g. 1.0.1) at the -cost of adding a bit of boilerplate to your code. For details, see -http://groups.google.com/group/google-perftools/msg/2686d9f24ac4365f +* libunwind is another supported mechanism and is default when + available. It also depends on eh_frame stuff (or dwarf or some + arm-specific thingy when available). When using it, be sure to use + latest available libunwind version. As with libgcc some people + occasionally had trouble with it on codes with broken or missing + unwind info. If you encounter something like that, first make sure + to file tickets against your compiler vender. Second, libunwind has + configure option to check accesses more thoroughly, so consider + that. - CAUTION: if you install libunwind from the url above, be aware that - you may have trouble if you try to statically link your binary with - perftools: that is, if you link with 'gcc -static -lgcc_eh ...'. - This is because both libunwind and libgcc implement the same C++ - exception handling APIs, but they implement them differently on - some platforms. This is not likely to be a problem on ia64, but - may be on x86-64. - - Also, if you link binaries statically, make sure that you add - -Wl,--eh-frame-hdr to your linker options. This is required so that - libunwind can find the information generated by the compiler - required for stack unwinding. - - Using -static is rare, though, so unless you know this will affect - you it probably won't. - -If you cannot or do not wish to install libunwind, you can still try -to use the built-in stack unwinder. The built-in stack unwinder -requires that your application, the tcmalloc library, and system -libraries like libc, all be compiled with a frame pointer. This is -*not* the default for x86-64. - -If you are on x86-64 system, know that you have a set of system -libraries with frame-pointers enabled, and compile all your -applications with -fno-omit-frame-pointer, then you can enable the -built-in perftools stack unwinder by passing the ---enable-frame-pointers flag to configure. - -Even with the use of libunwind, there are still known problems with -stack unwinding on 64-bit systems, particularly x86-64. See the -"64-BIT ISSUES" section in README. - -If you encounter problems, try compiling perftools with './configure ---enable-frame-pointers'. Note you will need to compile your -application with frame pointers (via 'gcc -fno-omit-frame-pointer -...') in this case. +* many systems provide backtrace() function either as part of their + libc or in -lexecinfo. On most systems, including GNU/Linux, it is + not built by default, so pass --enable-stacktrace-via-backtrace to + configure to enable it. Occasionally this implementation will call + malloc when capturing backtrace, but we should automagically handle + it via our "emergency malloc" facility which is now built by default + on most systems (but it currently doesn't handle being used by + cpuprofiler). *** TCMALLOC LARGE PAGES: TRADING TIME FOR SPACE @@ -138,20 +136,6 @@ flag yet. To build libtcmalloc with smaller internal caches, run (or add -DTCMALLOC_SMALL_BUT_SLOW to your existing CXXFLAGS argument). -*** NOTE FOR ___tls_get_addr ERROR - -When compiling perftools on some old systems, like RedHat 8, you may -get an error like this: - ___tls_get_addr: symbol not found - -This means that you have a system where some parts are updated enough -to support Thread Local Storage, but others are not. The perftools -configure script can't always detect this kind of case, leading to -that error. To fix it, just comment out the line - #define HAVE_TLS 1 -in your config.h file before building. - - *** TCMALLOC AND DLOPEN To improve performance, we use the "initial exec" model of Thread @@ -159,132 +143,37 @@ Local Storage in tcmalloc. The price for this is the library will not work correctly if it is loaded via dlopen(). This should not be a problem, since loading a malloc-replacement library via dlopen is asking for trouble in any case: some data will be allocated with one -malloc, some with another. If, for some reason, you *do* need to use -dlopen on tcmalloc, the easiest way is to use a version of tcmalloc -with TLS turned off; see the ___tls_get_addr note above. +malloc, some with another. *** COMPILING ON NON-LINUX SYSTEMS -Perftools has been tested on the following systems: - FreeBSD 6.0 (x86) - FreeBSD 8.1 (x86_64) - Linux CentOS 5.5 (x86_64) - Linux Debian 4.0 (PPC) - Linux Debian 5.0 (x86) - Linux Fedora Core 3 (x86) - Linux Fedora Core 4 (x86) - Linux Fedora Core 5 (x86) - Linux Fedora Core 6 (x86) - Linux Fedora Core 13 (x86_64) - Linux Fedora Core 14 (x86_64) - Linux RedHat 9 (x86) - Linux Slackware 13 (x86_64) - Linux Ubuntu 6.06.1 (x86) - Linux Ubuntu 6.06.1 (x86_64) - Linux Ubuntu 10.04 (x86) - Linux Ubuntu 10.10 (x86_64) - Mac OS X 10.3.9 (Panther) (PowerPC) - Mac OS X 10.4.8 (Tiger) (PowerPC) - Mac OS X 10.4.8 (Tiger) (x86) - Mac OS X 10.5 (Leopard) (x86) - Mac OS X 10.6 (Snow Leopard) (x86) - Solaris 10 (x86_64) - Windows XP, Visual Studio 2003 (VC++ 7.1) (x86) - Windows XP, Visual Studio 2005 (VC++ 8) (x86) - Windows XP, Visual Studio 2005 (VC++ 9) (x86) - Windows XP, Visual Studio 2005 (VC++ 10) (x86) - Windows XP, MinGW 5.1.3 (x86) - Windows XP, Cygwin 5.1 (x86) +We regularly build and test on typical modern GNU/Linux systems. You +should expect all tests to pass on modern Linux distros and x86, +aarch64 and riscv machines. Other machine types may fail some tests, +but you should expect at least malloc to be fully functional. -It works in its full generality on the Linux systems -tested (though see 64-bit notes above). Portions of perftools work on -the other systems. The basic memory-allocation library, -tcmalloc_minimal, works on all systems. The cpu-profiler also works -fairly widely. However, the heap-profiler and heap-checker are not -yet as widely supported. In general, the 'configure' script will -detect what OS you are building for, and only build the components -that work on that OS. +Perftools has been tested on the following non-Linux systems: + Various recent versions of FreeBSD (x86-64 mostly) + Recent version of NetBSD (x86-64) + Recent versions of OSX (aarch64, x86 and ppc hasn't been tested for some time) + Solaris 10 (x86_64), but not recently + Windows using both MSVC (starting from MSVC 2015 and later) and mingw toolchains + Windows XP and other obsolete versions have not been tested recently + Windows XP, Cygwin 5.1 (x86), but not recently + +Portions of gperftools work on those other systems. The basic +memory-allocation library, tcmalloc_minimal, works on all systems. +The cpu-profiler also works fairly widely. However, the heap-profiler +and heap-checker are not yet as widely supported. Heap checker is now +deprecated. In general, the 'configure' script will detect what OS you +are building for, and only build the components that work on that OS. Note that tcmalloc_minimal is perfectly usable as a malloc/new replacement, so it is possible to use tcmalloc on all the systems above, by linking in libtcmalloc_minimal. -** FreeBSD: - - The following binaries build and run successfully (creating - libtcmalloc_minimal.so and libprofile.so in the process): - % ./configure - % make tcmalloc_minimal_unittest tcmalloc_minimal_large_unittest \ - addressmap_unittest atomicops_unittest frag_unittest \ - low_level_alloc_unittest markidle_unittest memalign_unittest \ - packed_cache_test stacktrace_unittest system_alloc_unittest \ - thread_dealloc_unittest profiler_unittest.sh - % ./tcmalloc_minimal_unittest # to run this test - % [etc] # to run other tests - - Three caveats: first, frag_unittest tries to allocate 400M of memory, - and if you have less virtual memory on your system, the test may - fail with a bad_alloc exception. - - Second, profiler_unittest.sh sometimes fails in the "fork" test. - This is because stray SIGPROF signals from the parent process are - making their way into the child process. (This may be a kernel - bug that only exists in older kernels.) The profiling code itself - is working fine. This only affects programs that call fork(); for - most programs, the cpu profiler is entirely safe to use. - - Third, perftools depends on /proc to get shared library - information. If you are running a FreeBSD system without proc, - perftools will not be able to map addresses to functions. Some - unittests will fail as a result. - - Finally, the new test introduced in perftools-1.2, - profile_handler_unittest, fails on FreeBSD. It has something to do - with how the itimer works. The cpu profiler test passes, so I - believe the functionality is correct and the issue is with the test - somehow. If anybody is an expert on itimers and SIGPROF in - FreeBSD, and would like to debug this, I'd be glad to hear the - results! - - libtcmalloc.so successfully builds, and the "advanced" tcmalloc - functionality all works except for the leak-checker, which has - Linux-specific code: - % make heap-profiler_unittest.sh \ - tcmalloc_unittest tcmalloc_both_unittest \ - tcmalloc_large_unittest # THESE WORK - % make -k heap-checker_unittest.sh \ - heap-checker-death_unittest.sh # THESE DO NOT - - Note that unless you specify --enable-heap-checker explicitly, - 'make' will not build the heap-checker unittests on a FreeBSD - system. - - I have not tested other *BSD systems, but they are probably similar. - -** Mac OS X: - - I've tested OS X 10.5 [Leopard], OS X 10.4 [Tiger] and OS X 10.3 - [Panther] on both intel (x86) and PowerPC systems. For Panther - systems, perftools does not work at all: it depends on a header - file, OSAtomic.h, which is new in 10.4. (It's possible to get the - code working for Panther/i386 without too much work; if you're - interested in exploring this, drop an e-mail.) - - For the other seven systems, the binaries and libraries that - successfully build are exactly the same as for FreeBSD. See that - section for a list of binaries and instructions on building them. - - In addition, it appears OS X regularly fails profiler_unittest.sh - in the "thread" test (in addition to occassionally failing in the - "fork" test). It looks like OS X often delivers the profiling - signal to the main thread, even when it's sleeping, rather than - spawned threads that are doing actual work. If anyone knows - details of how OS X handles SIGPROF (via setitimer()) events with - threads, and has insight into this problem, please send mail to - google-perftools@googlegroups.com. - -** Solaris 10 x86: +** Solaris 10 x86: (note, this is fairly old) I've only tested using the GNU C++ compiler, not the Sun C++ compiler. Using g++ requires setting the PATH appropriately when @@ -306,41 +195,12 @@ above, by linking in libtcmalloc_minimal. Work on Windows is rather preliminary: only tcmalloc_minimal is supported. - We haven't found a good way to get stack traces in release mode on - windows (that is, when FPO is enabled), so the heap profiling may - not be reliable in that case. Also, heap-checking and CPU profiling - do not yet work at all. But as in other ports, the basic tcmalloc - library functionality, overriding malloc and new and such (and even - windows-specific functions like _aligned_malloc!), is working fine, - at least with VC++ 7.1 (Visual Studio 2003) through VC++ 10.0, - in both debug and release modes. See README.windows for - instructions on how to install on Windows using Visual Studio. - - Cygwin can compile some but not all of perftools. Furthermore, - there is a problem with exception-unwinding in cygwin (it can call - malloc, which can call the exception-unwinding-setup code, which - can lead to an infinite loop). I've comitted a workaround to the - exception unwinding problem, but it only works in debug mode and - when statically linking in tcmalloc. I hope to have a more proper - fix in a later release. To configure under cygwin, run - - ./configure --disable-shared CXXFLAGS=-g && make - - Most of cygwin will compile (cygwin doesn't allow weak symbols, so - the heap-checker and a few other pieces of functionality will not - compile). 'make' will compile those libraries and tests that can - be compiled. You can run 'make check' to make sure the basic - functionality is working. I've heard reports that some versions of - cygwin fail calls to pthread_join() with EINVAL, causing several - tests to fail. If you have any insight into this, please mail - google-perftools@googlegroups.com. - This Windows functionality is also available using MinGW and Msys, In this case, you can use the regular './configure && make' process. 'make install' should also work. The Makefile will limit itself to those libraries and binaries that work on windows. -** AIX +** AIX (as of 2021) I've tested using the IBM XL and IBM Open XL Compilers. The minimum requirement for IBM XL is V16 which includes C++11 @@ -402,6 +262,7 @@ above, by linking in libtcmalloc_minimal. the subsystem is replaced it is used for all commands issued from the terminal. + Basic Installation ==================