If we don't do it, then reading variable calls to __tls_get_addr, which
uses malloc on first call. initial-exec makes dynamic linker reserve tls
offset for recusion flag early and thus avoid unsafe calls to malloc.
This fixes issue #786.
Building with -DTCMALLOC_USE_MADV_FREE will enable usage of MADV_FREE on
Linux if glibc copy of kernel headers has MADV_FREE defined.
I.e. so that people can test this more easily.
Affects ticket #780.
Commit e580d78881 fixed the macros in some
of the code but not other places.
`make check` still fails in the same places on a Debian Jessie armhf
system.
In this case we alias to regular delete. This is helpful because if we
don't override sized delete, then apps will call version in libstdc++
which delegates calls to regular delete, which is slower than calling
regular delete directly.
IFUNC relocations don't support our advanced use case (calling
application function or looking up environment variable).
Particularly, it doesn't work on PPC and arm when tcmalloc is linked
with -Wl,-z,now. See RedHat's bugzilla ticket
https://bugzilla.redhat.com/show_bug.cgi?id=1312462 for more details.
This is similar to what gcc 5 does anyways, except gcc 5 is placing
jumps which adds a bit of overhead.
Instead of letting gcc do it, we alias using ELF symbol aliasing. All
free variants (tc_delete{,array}_{,nothrow}) are aliased to
tc_free. There are 3 malloc variants that differ by oom
handling. tc_newarray is aliased to tc_new. And tc_newarray_nothrow is
aliased to tc_new_nothrow.
This aliasing only happens in non-debug malloc, since debug malloc does
distinguish between different variants since it needs checking for
mismatch.
This closes#723.
Since rounding up prior to sampling is introducing possibility of
arithmetic overflow, we're just not doing it.
It introduces some error (up to 4k), but since we're dealing with at
least 256k allocations, we're fine.
Emergency malloc is enabled for cases when backtrace capturing needs to
call malloc. In this case, we enable emergency malloc just prior to
calling such code and disable it after it is done.
We're now using it only when overriding glibc functions (such as malloc
or mmap). In other cases (most importantly in public tcmalloc.h header)
we're doing our own throw() to avoid possible breakage on future glibc
changes.
We have shipped header which checked HAVE_XXX defines which we only
defined in project-local config.h. So it could never work correctly.
We're now doing #include <malloc.h> just like tc_mallinfo on constant
which we detect at configure time and write into header that we install.
Particularly _Unwind_Backtrace which seems to be gcc extension.
This is what glibc's backtrace is commonly is using.
Using _Unwind_Backtrace directly is better than glibc's backtrace, since
it doesn't call into dlopen. While glibc does dlopen when it is built as
shared library apparently to avoid link-time dependency on libgcc_s.so
'XOR loop' in profiler unittest wasn't 100% effective because it allowed
compiler to avoid loading and storing to memory.
After marking result variable as volatile, we're now forcing compiler to
read and write memory, slowing this loops down sufficiently. And
profiler_unittest is now passing more consistently.
Closes#628
Particularly, on arm-linux and x86-64-debian-kfreebsd compilation fails
due to lack of support for ifunc. So it is necessary to test at
configure time whether ifunc is supported.
malloc_hook.h includes malloc_hook_c.h as <gperftools/malloc_hook_c.h>.
This requires the compiler to have designated src/gperftools as a
standard include directory (-I), which may not always be the case.
Instead, include it as "malloc_hook_c.h", which will search in the same
directory first. This will always work, regardless of whether it was
designated a standard include directory.
It causes a noticeable performance hit and can sometimes confuse GDB.
Tested with CPUPROFILE_PER_THREAD_TIMERS=1.
Based on an old version by mnissler@google.com.
that it isn't used by the program, as it might still be needed to override the
corresponding symbol in shared libraries (or inline assembler for that matter).
For example, suppose the program uses malloc and free but not calloc and is
statically linked against tcmalloc (built with -flto) and LTO is done. Then
before this patch the calloc alias would be deleted by LTO due to not being
used, but the malloc/free aliases would be kept because they are used by the
program. Suppose the program is dynamically linked with a shared library that
allocates memory using calloc and later frees it by calling free. Then calloc
will use the libc memory allocator, because the calloc alias was deleted, but
free will call into tcmalloc, resulting in a crash.
It was reported that clang on OSX doesn't support alias attribute. Most
likely because of executable format limitations.
New code limits use of alias to gcc-compatible compilers on elf
platforms (various gnu and *bsd systems). Elf format is known to support
aliases.
Some workloads get much slower with too large batch size.
This closes bug #678.
binary_trees benchmark benefits from larger batch size. And I found that
512 is not much slower than huge value that we had.