mirror of
https://github.com/gperftools/gperftools
synced 2025-01-23 15:34:46 +00:00
488 lines
20 KiB
Plaintext
488 lines
20 KiB
Plaintext
Copyright 1994, 1995, 1996, 1999, 2000, 2001, 2002 Free Software
|
|
Foundation, Inc.
|
|
|
|
This file is free documentation; the Free Software Foundation gives
|
|
unlimited permission to copy, distribute and modify it.
|
|
|
|
|
|
Perftools-Specific Install Notes
|
|
================================
|
|
|
|
See generic autotool-provided installation notes at the
|
|
end. Immediately below you can see gperftools-specific details.
|
|
|
|
*** Building from source repository
|
|
|
|
As of 2.1 gperftools does not have configure and other autotools
|
|
products checked into it's source repository. This is common practice
|
|
for projects using autotools.
|
|
|
|
NOTE: Source releases (.tar.gz that you download from
|
|
https://github.com/gperftools/gperftools/releases) still have all
|
|
required files just as before. Nothing has changed w.r.t. building
|
|
from .tar.gz releases.
|
|
|
|
But, in order to build gperftools checked out from subversion
|
|
repository you need to have autoconf, automake and libtool
|
|
installed. And before running ./configure you have to generate it (and
|
|
a bunch of other files) by running ./autogen.sh script. That script
|
|
will take care of calling correct autotools programs in correct order.
|
|
|
|
If you're maintainer then it's business as usual too. Just run make
|
|
dist (or, preferably, make distcheck) and it'll produce .tar.gz or
|
|
.tar.bz2 with all autotools magic already included. So that users can
|
|
build our software without having autotools.
|
|
|
|
|
|
*** Stacktrace capturing details
|
|
|
|
A number of gperftools facilities capture stack traces. And
|
|
occasionally this happens in 'tricky' locations, like in SIGPROF
|
|
handler. So some platforms and library versions occasionally cause
|
|
troubles (crashes or hangs, or truncated stack traces).
|
|
|
|
So we do provide several implementations that our users are able to
|
|
select at runtime. Pass TCMALLOC_STACKTRACE_METHOD_VERBOSE=t as
|
|
environment variable to ./stacktrace_unittest to see options.
|
|
|
|
* frame-pointer-based stacktracing is fully supported on x86 (all 3
|
|
kinds: i386, x32 and x86-64 are suppored), aarch64 and riscv. But
|
|
all modern architectures and ABIs by default build code without
|
|
frame pointers (even on i386). So in order to get anything useful
|
|
out of this option, you need to build your code with frame
|
|
pointers. It adds some performance overhead (usually people quote
|
|
order of 2%-3%, but it can really vary based on workloads). Also it
|
|
is worth mentioning, that it is fairly common for various asm
|
|
routines not to have frame pointers, so you'll have somewhat
|
|
imperfect profiles out of typical asm bits like memcpy. This stack
|
|
trace capuring method is also fastest (like 2-3 orders of magnitude
|
|
faster), which will matter when stacktrace capturing is done a lot
|
|
(e.g. heap profiler).
|
|
|
|
* libgcc-based stacktracing works particularly great on modern
|
|
GNU/Linux systems with glibc 2.34 or later and libgcc from gcc 12 or
|
|
later. Thanks to usage of dl_find_object API introduced in recent
|
|
glibc-s this implementation seems to be truly async-signal safe and
|
|
it is reasonably fast too. On Linux and other ELF platforms it uses
|
|
eh_frame facility (which is very similar to dwarf unwind info). It
|
|
was originally introduced for exception handling. On most modern
|
|
platforms this unwind info is automatically added by compilers. On
|
|
others you might need to add -fexceptions and/or
|
|
-fasynchrnous-unwind-tables to your compiler flags. To make this
|
|
option default, pass --enable-libgcc-unwinder-by-default to
|
|
configure. When used without dl_find_object it will occasionally
|
|
deadlock especially when used in cpuprofiler.
|
|
|
|
* libunwind is another supported mechanism and is default when
|
|
available. It also depends on eh_frame stuff (or dwarf or some
|
|
arm-specific thingy when available). When using it, be sure to use
|
|
latest available libunwind version. As with libgcc some people
|
|
occasionally had trouble with it on codes with broken or missing
|
|
unwind info. If you encounter something like that, first make sure
|
|
to file tickets against your compiler vender. Second, libunwind has
|
|
configure option to check accesses more thoroughly, so consider
|
|
that.
|
|
|
|
* many systems provide backtrace() function either as part of their
|
|
libc or in -lexecinfo. On most systems, including GNU/Linux, it is
|
|
not built by default, so pass --enable-stacktrace-via-backtrace to
|
|
configure to enable it. Occasionally this implementation will call
|
|
malloc when capturing backtrace, but we should automagically handle
|
|
it via our "emergency malloc" facility which is now built by default
|
|
on most systems (but it currently doesn't handle being used by
|
|
cpuprofiler).
|
|
|
|
|
|
*** TCMALLOC LARGE PAGES: TRADING TIME FOR SPACE
|
|
|
|
You can set a compiler directive that makes tcmalloc faster, at the
|
|
cost of using more space (due to internal fragmentation).
|
|
|
|
Internally, tcmalloc divides its memory into "pages." The default
|
|
page size is chosen to minimize memory use by reducing fragmentation.
|
|
The cost is that keeping track of these pages can cost tcmalloc time.
|
|
We've added a new flag to tcmalloc that enables a larger page size.
|
|
In general, this will increase the memory needs of applications using
|
|
tcmalloc. However, in many cases it will speed up the applications
|
|
as well, particularly if they allocate and free a lot of memory. We've
|
|
seen average speedups of 3-5% on Google applications.
|
|
|
|
To build libtcmalloc with large pages you need to use the
|
|
--with-tcmalloc-pagesize=ARG configure flag, e.g.:
|
|
|
|
./configure <other flags> --with-tcmalloc-pagesize=32
|
|
|
|
The ARG argument can be 4, 8, 16, 32, 64, 128 or 256 which sets the
|
|
internal page size to 4K, 8K, 16K, 32K, 64K, 128K and 256K respectively.
|
|
The default is 8K.
|
|
|
|
|
|
*** SMALL TCMALLOC CACHES: TRADING SPACE FOR TIME
|
|
|
|
You can set a compiler directive that makes tcmalloc use less memory
|
|
for overhead, at the cost of some time.
|
|
|
|
Internally, tcmalloc keeps information about some of its internal data
|
|
structures in a cache. This speeds memory operations that need to
|
|
access this internal data. We've added a new, experimental flag to
|
|
tcmalloc that reduces the size of this cache, decresaing the memory
|
|
needs of applications using tcmalloc.
|
|
|
|
This feature is still very experimental; it's not even a configure
|
|
flag yet. To build libtcmalloc with smaller internal caches, run
|
|
|
|
./configure <normal flags> CXXFLAGS=-DTCMALLOC_SMALL_BUT_SLOW
|
|
|
|
(or add -DTCMALLOC_SMALL_BUT_SLOW to your existing CXXFLAGS argument).
|
|
|
|
|
|
*** TCMALLOC AND DLOPEN
|
|
|
|
To improve performance, we use the "initial exec" model of Thread
|
|
Local Storage in tcmalloc. The price for this is the library will not
|
|
work correctly if it is loaded via dlopen(). This should not be a
|
|
problem, since loading a malloc-replacement library via dlopen is
|
|
asking for trouble in any case: some data will be allocated with one
|
|
malloc, some with another.
|
|
|
|
|
|
*** COMPILING ON NON-LINUX SYSTEMS
|
|
|
|
We regularly build and test on typical modern GNU/Linux systems. You
|
|
should expect all tests to pass on modern Linux distros and x86,
|
|
aarch64 and riscv machines. Other machine types may fail some tests,
|
|
but you should expect at least malloc to be fully functional.
|
|
|
|
Perftools has been tested on the following non-Linux systems:
|
|
Various recent versions of FreeBSD (x86-64 mostly)
|
|
Recent version of NetBSD (x86-64)
|
|
Recent versions of OSX (aarch64, x86 and ppc hasn't been tested for some time)
|
|
Solaris 10 (x86_64), but not recently
|
|
Windows using both MSVC (starting from MSVC 2015 and later) and mingw toolchains
|
|
Windows XP and other obsolete versions have not been tested recently
|
|
Windows XP, Cygwin 5.1 (x86), but not recently
|
|
|
|
Portions of gperftools work on those other systems. The basic
|
|
memory-allocation library, tcmalloc_minimal, works on all systems.
|
|
The cpu-profiler also works fairly widely. However, the heap-profiler
|
|
and heap-checker are not yet as widely supported. Heap checker is now
|
|
deprecated. In general, the 'configure' script will detect what OS you
|
|
are building for, and only build the components that work on that OS.
|
|
|
|
Note that tcmalloc_minimal is perfectly usable as a malloc/new
|
|
replacement, so it is possible to use tcmalloc on all the systems
|
|
above, by linking in libtcmalloc_minimal.
|
|
|
|
** Solaris 10 x86: (note, this is fairly old)
|
|
|
|
I've only tested using the GNU C++ compiler, not the Sun C++
|
|
compiler. Using g++ requires setting the PATH appropriately when
|
|
configuring.
|
|
|
|
% PATH=${PATH}:/usr/sfw/bin/:/usr/ccs/bin ./configure
|
|
% PATH=${PATH}:/usr/sfw/bin/:/usr/ccs/bin make [...]
|
|
|
|
Again, the binaries and libraries that successfully build are
|
|
exactly the same as for FreeBSD. (However, while libprofiler.so can
|
|
be used to generate profiles, pprof is not very successful at
|
|
reading them -- necessary helper programs like nm don't seem
|
|
to be installed by default on Solaris, or perhaps are only
|
|
installed as part of the Sun C++ compiler package.) See that
|
|
section for a list of binaries, and instructions on building them.
|
|
|
|
** Windows (MSVC, Cygwin, and MinGW):
|
|
|
|
Work on Windows is rather preliminary: only tcmalloc_minimal is
|
|
supported.
|
|
|
|
This Windows functionality is also available using MinGW and Msys,
|
|
In this case, you can use the regular './configure && make'
|
|
process. 'make install' should also work. The Makefile will limit
|
|
itself to those libraries and binaries that work on windows.
|
|
|
|
** AIX (as of 2021)
|
|
|
|
I've tested using the IBM XL and IBM Open XL Compilers. The
|
|
minimum requirement for IBM XL is V16 which includes C++11
|
|
support. IBM XL and gcc are not ABI compatible. If you would
|
|
like to use the library with a gcc built executable then the
|
|
library must also be built with gcc. To use the library with
|
|
and IBM XL built binary then it follows that the library must
|
|
also be built with IBM XL.
|
|
|
|
Both 32-bit and 64-bit builds have been tested.
|
|
|
|
To do a 32-bit IBM XL build:
|
|
% ./configure CC="xlclang" CXX="xlclang++" AR="ar"
|
|
RANLIB="ranlib" NM="nm"
|
|
To do a 64-bit IBM XL build:
|
|
% ./configure CC="xlclang -q64" CXX="xlclang++ -q64"
|
|
AR="ar -X64" RANLIB="ranlib -X64" NM="nm -X64"
|
|
|
|
Add your favorite optimization levels via CFLAGS and CXXFLAGS.
|
|
|
|
If you link to the shared library but it may not work as you
|
|
expect. Allocations and deallocations that occur from within
|
|
the Standard C and C++ libraries will not be redirected the
|
|
tcmalloc library.
|
|
|
|
The recommended method is to use the AIX User-defined malloc
|
|
replacement as documented by IBM. This replaces the default
|
|
AIX memory subsystem with a user defined memory subsystem.
|
|
|
|
The AIX user defined memory subsystem specifies that the 32-
|
|
and 64- bit objects must be placed in an archive with the
|
|
32-bit shared object named mem32.o and the 64-bit shared
|
|
object named mem64.o.
|
|
|
|
It is recommended to make combined 32_64 bit archive by
|
|
doing a 64-bit build, then copy the shared library to mem64.o
|
|
add mem64.o the archive, then do a 32-bit build
|
|
copy the shared library to mem32.o and add it to the same
|
|
combined archive.
|
|
|
|
For eg) perform a 64-bit build then:
|
|
% cp libtcmalloc_minimal.so.4 mem64.o
|
|
% ar -X32_64 -r libtmalloc_minimal.a mem64.o
|
|
|
|
Followed by a 32-bit build:
|
|
% cp libtcmalloc_minimal.so.4 mem32.o
|
|
% ar -X32_64 -r libtmalloc_minimal.a mem32.o
|
|
|
|
The final archive should contain both mem32.o and mem64.o
|
|
|
|
To use the library you are expected have the library location
|
|
in your LIBPATH or LD_LIBRARY_PATH followed by exporting the
|
|
environment variable MALLOCTYPE=user:libtcmalloc_minimal.a to
|
|
enable the new user defined memory subsystem.
|
|
|
|
I recommend using:
|
|
% MALLOCTYPE=user:libtcmalloc_minimal.a <user-exectuable>
|
|
to minimize the impact of replacing the memory subsystem. Once
|
|
the subsystem is replaced it is used for all commands issued from
|
|
the terminal.
|
|
|
|
|
|
Basic Installation
|
|
==================
|
|
|
|
These are generic installation instructions.
|
|
|
|
The `configure' shell script attempts to guess correct values for
|
|
various system-dependent variables used during compilation. It uses
|
|
those values to create a `Makefile' in each directory of the package.
|
|
It may also create one or more `.h' files containing system-dependent
|
|
definitions. Finally, it creates a shell script `config.status' that
|
|
you can run in the future to recreate the current configuration, and a
|
|
file `config.log' containing compiler output (useful mainly for
|
|
debugging `configure').
|
|
|
|
It can also use an optional file (typically called `config.cache'
|
|
and enabled with `--cache-file=config.cache' or simply `-C') that saves
|
|
the results of its tests to speed up reconfiguring. (Caching is
|
|
disabled by default to prevent problems with accidental use of stale
|
|
cache files.)
|
|
|
|
If you need to do unusual things to compile the package, please try
|
|
to figure out how `configure' could check whether to do them, and mail
|
|
diffs or instructions to the address given in the `README' so they can
|
|
be considered for the next release. If you are using the cache, and at
|
|
some point `config.cache' contains results you don't want to keep, you
|
|
may remove or edit it.
|
|
|
|
The file `configure.ac' (or `configure.in') is used to create
|
|
`configure' by a program called `autoconf'. You only need
|
|
`configure.ac' if you want to change it or regenerate `configure' using
|
|
a newer version of `autoconf'.
|
|
|
|
The simplest way to compile this package is:
|
|
|
|
1. `cd' to the directory containing the package's source code and type
|
|
`./configure' to configure the package for your system. If you're
|
|
using `csh' on an old version of System V, you might need to type
|
|
`sh ./configure' instead to prevent `csh' from trying to execute
|
|
`configure' itself.
|
|
|
|
Running `configure' takes awhile. While running, it prints some
|
|
messages telling which features it is checking for.
|
|
|
|
2. Type `make' to compile the package.
|
|
|
|
3. Optionally, type `make check' to run any self-tests that come with
|
|
the package.
|
|
|
|
4. Type `make install' to install the programs and any data files and
|
|
documentation.
|
|
|
|
5. You can remove the program binaries and object files from the
|
|
source code directory by typing `make clean'. To also remove the
|
|
files that `configure' created (so you can compile the package for
|
|
a different kind of computer), type `make distclean'. There is
|
|
also a `make maintainer-clean' target, but that is intended mainly
|
|
for the package's developers. If you use it, you may have to get
|
|
all sorts of other programs in order to regenerate files that came
|
|
with the distribution.
|
|
|
|
Compilers and Options
|
|
=====================
|
|
|
|
Some systems require unusual options for compilation or linking that
|
|
the `configure' script does not know about. Run `./configure --help'
|
|
for details on some of the pertinent environment variables.
|
|
|
|
You can give `configure' initial values for configuration parameters
|
|
by setting variables in the command line or in the environment. Here
|
|
is an example:
|
|
|
|
./configure CC=c89 CFLAGS=-O2 LIBS=-lposix
|
|
|
|
*Note Defining Variables::, for more details.
|
|
|
|
Compiling For Multiple Architectures
|
|
====================================
|
|
|
|
You can compile the package for more than one kind of computer at the
|
|
same time, by placing the object files for each architecture in their
|
|
own directory. To do this, you must use a version of `make' that
|
|
supports the `VPATH' variable, such as GNU `make'. `cd' to the
|
|
directory where you want the object files and executables to go and run
|
|
the `configure' script. `configure' automatically checks for the
|
|
source code in the directory that `configure' is in and in `..'.
|
|
|
|
If you have to use a `make' that does not support the `VPATH'
|
|
variable, you have to compile the package for one architecture at a
|
|
time in the source code directory. After you have installed the
|
|
package for one architecture, use `make distclean' before reconfiguring
|
|
for another architecture.
|
|
|
|
Installation Names
|
|
==================
|
|
|
|
By default, `make install' will install the package's files in
|
|
`/usr/local/bin', `/usr/local/man', etc. You can specify an
|
|
installation prefix other than `/usr/local' by giving `configure' the
|
|
option `--prefix=PATH'.
|
|
|
|
You can specify separate installation prefixes for
|
|
architecture-specific files and architecture-independent files. If you
|
|
give `configure' the option `--exec-prefix=PATH', the package will use
|
|
PATH as the prefix for installing programs and libraries.
|
|
Documentation and other data files will still use the regular prefix.
|
|
|
|
In addition, if you use an unusual directory layout you can give
|
|
options like `--bindir=PATH' to specify different values for particular
|
|
kinds of files. Run `configure --help' for a list of the directories
|
|
you can set and what kinds of files go in them.
|
|
|
|
If the package supports it, you can cause programs to be installed
|
|
with an extra prefix or suffix on their names by giving `configure' the
|
|
option `--program-prefix=PREFIX' or `--program-suffix=SUFFIX'.
|
|
|
|
Optional Features
|
|
=================
|
|
|
|
Some packages pay attention to `--enable-FEATURE' options to
|
|
`configure', where FEATURE indicates an optional part of the package.
|
|
They may also pay attention to `--with-PACKAGE' options, where PACKAGE
|
|
is something like `gnu-as' or `x' (for the X Window System). The
|
|
`README' should mention any `--enable-' and `--with-' options that the
|
|
package recognizes.
|
|
|
|
For packages that use the X Window System, `configure' can usually
|
|
find the X include and library files automatically, but if it doesn't,
|
|
you can use the `configure' options `--x-includes=DIR' and
|
|
`--x-libraries=DIR' to specify their locations.
|
|
|
|
Specifying the System Type
|
|
==========================
|
|
|
|
There may be some features `configure' cannot figure out
|
|
automatically, but needs to determine by the type of machine the package
|
|
will run on. Usually, assuming the package is built to be run on the
|
|
_same_ architectures, `configure' can figure that out, but if it prints
|
|
a message saying it cannot guess the machine type, give it the
|
|
`--build=TYPE' option. TYPE can either be a short name for the system
|
|
type, such as `sun4', or a canonical name which has the form:
|
|
|
|
CPU-COMPANY-SYSTEM
|
|
|
|
where SYSTEM can have one of these forms:
|
|
|
|
OS KERNEL-OS
|
|
|
|
See the file `config.sub' for the possible values of each field. If
|
|
`config.sub' isn't included in this package, then this package doesn't
|
|
need to know the machine type.
|
|
|
|
If you are _building_ compiler tools for cross-compiling, you should
|
|
use the `--target=TYPE' option to select the type of system they will
|
|
produce code for.
|
|
|
|
If you want to _use_ a cross compiler, that generates code for a
|
|
platform different from the build platform, you should specify the
|
|
"host" platform (i.e., that on which the generated programs will
|
|
eventually be run) with `--host=TYPE'.
|
|
|
|
Sharing Defaults
|
|
================
|
|
|
|
If you want to set default values for `configure' scripts to share,
|
|
you can create a site shell script called `config.site' that gives
|
|
default values for variables like `CC', `cache_file', and `prefix'.
|
|
`configure' looks for `PREFIX/share/config.site' if it exists, then
|
|
`PREFIX/etc/config.site' if it exists. Or, you can set the
|
|
`CONFIG_SITE' environment variable to the location of the site script.
|
|
A warning: not all `configure' scripts look for a site script.
|
|
|
|
Defining Variables
|
|
==================
|
|
|
|
Variables not defined in a site shell script can be set in the
|
|
environment passed to `configure'. However, some packages may run
|
|
configure again during the build, and the customized values of these
|
|
variables may be lost. In order to avoid this problem, you should set
|
|
them in the `configure' command line, using `VAR=value'. For example:
|
|
|
|
./configure CC=/usr/local2/bin/gcc
|
|
|
|
will cause the specified gcc to be used as the C compiler (unless it is
|
|
overridden in the site shell script).
|
|
|
|
`configure' Invocation
|
|
======================
|
|
|
|
`configure' recognizes the following options to control how it
|
|
operates.
|
|
|
|
`--help'
|
|
`-h'
|
|
Print a summary of the options to `configure', and exit.
|
|
|
|
`--version'
|
|
`-V'
|
|
Print the version of Autoconf used to generate the `configure'
|
|
script, and exit.
|
|
|
|
`--cache-file=FILE'
|
|
Enable the cache: use and save the results of the tests in FILE,
|
|
traditionally `config.cache'. FILE defaults to `/dev/null' to
|
|
disable caching.
|
|
|
|
`--config-cache'
|
|
`-C'
|
|
Alias for `--cache-file=config.cache'.
|
|
|
|
`--quiet'
|
|
`--silent'
|
|
`-q'
|
|
Do not print messages saying which checks are being made. To
|
|
suppress all normal output, redirect it to `/dev/null' (any error
|
|
messages will still be shown).
|
|
|
|
`--srcdir=DIR'
|
|
Look for the package's source code in directory DIR. Usually
|
|
`configure' can determine that directory automatically.
|
|
|
|
`configure' also accepts some other, not widely useful, options. Run
|
|
`configure --help' for more details.
|