For compiling things automake never needs to be given a full set of
headers. Usually headers are specified so that make dist includes them
into archive, but we can achieve this goal easier.
This reduces size and complexity of our Makefile.am stuff.
Logic was removed from thread_cache.{h,cc} into
thread_cache_ptr.{h,cc}.
Separation will help possible future evolution, and we already changed
the logic quite a bit:
* early access (when TLS isn't initialized yet) now uses global
ThreadCache instance. We therefore have ThreadCachePtr instances
managing required locking. This eliminates unnecessary complication of
PTHREADS_CRASHES_IF_RUN_TOO_EARLY logic, and any other danger of
touching TLS too early. BTW previous implementation actually leaked
initial early-initialized ThreadCache instance(!)
* old configure-time HAVE_TLS logic is amputated. Config-time part of
it made little sense as C++ 17 guarantees availability of
thread_local, but we have manually curated deny-list of "bad" OSes,
that we tested (via compile checks!) at configure time. Now this
is all compile time. There is now compile-time kHaveGoodTLS variable
and we're using it mostly via if constexpr.
* kHaveGoodTLS case of creating thread cache is simplified and made
more straightforward (no need to have in_setspecific logic).
* !kHaveGoodTLS case if fixed and improved too. We avoid
std:🧵:get_id, as it deadlocks on mingw. We use errno address as
a portable and (usually) async-signal safe 'my thread' identifier. We
also eliminate linear searching of thread's cache and replace it with
straightforward hash table lookup.
This allows us, later, to avoid building this stuff in configurations
that don't use it. I have also reduced API and ABI surface to enable
further refactorings.
* Remove build dependency on HAVE_PTHREAD
* Remove build dependency on HAVE_STD_ALIGNED_VAL_T and ENABLE_ALIGNED_NEW_DELETE
* Remove redundant tcmalloc.h files & ensure there are no cross-build-tool references
* Adopt automake commit 26927d1 in the CMake build
- Fix CMake builds for MinGW and MSVC
- Ensure the Autotools, CMake and VSProj builds do not reference each others' config.h
- Use std:🧵:id instead of our own thread ID wrappers
- Moved explicit TLS wrapper functions into the tcmalloc:: namespace and change their visibility to hidden
Resolves#1486
See github issue #1474 for immediate reason.
Note, this entire idea of number of convenience libraries is likely
simply artifact of Google's codebase past. We don't really need this
complexity. But I am holding big reorganization of this for after API
and ABI work. For now, simply moving dynamic_annotations.cc into
libsysinfo fixes things. Most of the code links both anyways. So lets
just do it.
We do shell wrapper for actual test run, so we can inspect output of
pprof. But when we set up sampling_debug_test.sh we simply copied
regular sampling_test.sh, which ran same non-debug test binary. Now we
sed-replace contents of shell program when copying, so we test right
binary.
Another thing we fix here is our (still hardcoded) test output path is
now different between sampling{,_debug}_test.sh. So this fixes main
cause of flakiness of our unit tests.
We used msync to verify that address is readable. But msync gives
false positives for PROT_NONE mappings. And we recently got bug report
from user hitting this exact condition.
For correct access check, we steal idea from Abseil and do sigprocmask
with address used as new signal mask and with invalid HOW
argument. This works in today's Linux kernels and is among fastest
methods available. But is brittle w.r.t. possible kernel changes. So
we supply fallback method that does 2 syscalls.
For non-Linux systems we implement usual "write to pipe" trick. Which
also has decent performance, but requires occasional pipe draining and
uses fds which could occasionally be damaged by some forking codes.
We also finally cover all new code with unit test.
Fixes github issue #1426
This unbreaks building on older Linux distros. We missed this at
46d3315ad7 when dropped maybe_thread
stuff, since libprofiler indeed uses pthread, and because on newer
libc-s pthread stuff is now part of regular libc.so.
I am also dropping bogus LIBPROFILER stuff referring to some rpath
badness. Unsure what it was, maybe way back we did libstacktrace as a
proper libtool library, so maybe something was needed. But it is just
a convenience archive this days, so we don't really need to add it
everywhere libprofiler.la is linked.
There was this piece of makefile with indention to add stack tracing
functionality (for stuff like growthz, GetCallerStackTrace and
probably heap sampling) to work even in minimal configuration on
mingw.
What is odd is we fail to actually define libstacktrace.la target on
mingw, since libstacktrace.la requires WITH_STACK_TRACE automake
conditional which we don't enable on this platform. And yet somehow it
doesn't fail. It produces empty libstacktrace.la, so build kinda
works. Except at least on my machine it produces racy makefiles. So
lets not pretend and stop breaking our parallel builds.
Previous implementation wasn't entirely safe w.r.t. 32-bit off_t
systems. Specifically around mmap replacement hook. Also, API was a
lot more general and broad than we actually need.
Sadly, old mmap hooks API was shipped with our public headers. But
thankfully it appears to be unused externally (checked via github
search). So we keep this old API and ABI for the sake of formal API
and ABI compatibility. But this old API is now empty and always
fails (some OS/hardware combinations didn't have functional
implementations of those hooks anyways).
New API is 64-bit clean and only provides us with what we need. Namely
being able to react to virtual address space mapping changes for
logging, heap profiling and heap leak checker. I.e. no pre hooks or
mmap-replacement hooks. We also explicitly not ship this API
externally to give us freedom to change it.
New code is also hopefully tidier and slightly more portable. At least
there are fewer arch-specific ifdef-s.
Another somewhat notable change is, since mmap hook isn't needed in
"minimal" configuration, we now don't override system's
mmap/munmap/etc functions in this configuration. No big deal, but it
reduces risk of damage if we somehow mess those up. I.e. musl's mmap
does few things that our mmap replacement doesn't, such as very fancy
vm_lock thingy. Which doesn't look critical, but is good thing for us
not to interfere with when not necessary.
Fixes issue #1406 and issue #1407. Lets also mention issue #1010 which
is somewhat relevant.
This facility allowed us to build tcmalloc without linking in actual
-lpthread. Via weak symbols we checked at runtime if pthread functions
are available and if not, special single-threaded stubs were used
instead. Not always brining in pthread dependency helped performance
of some programs or libraries which depended at runtime on whether
threads are linked or not. Most notable of those are libstdc++ which
uses non-atomic refcounting on single threaded programs.
But such optional dependency on pthreads caused complications for
nearly no benefit. One trouble was reported in github issue #1110.
This days glibc/libstdc++ combo actually depends on
sys/single_threaded.h facility. So bringing pthread at runtime is
fine. Also modern glibc ships pthread symbols inside libc anyways and
libpthread is empty. I also found that for whatever reason on BSDs and
osx we already pulled in proper pthreads too.
So we loose nothing and we get issue #1110 fixed. And we simplify
everything.
- Some small automake changes. Add libc++ for AIX instead of libstdc++
- Add the interface changes for AIX:User-defined malloc replacement
- Add code to avoid use of pthreads library prior to its initialization
- Some small changes to the unittest case
- Update INSTALL for AIX
[alkondratenko@gmail.com]: lower-case/de-trailing-dot for commit subject line
[alkondratenko@gmail.com]: rebase
[alkondratenko@gmail.com]: amputate unused AM_CONDITIONAL for AIX
[alkondratenko@gmail.com]: explicitly mention libc_override_aix.h in Makefile.am
We used to explicitly link to libstdc++, libm and even libpthread, but
this should be handled by libtool since those are dependencies of
libtcmalloc_minimal. What also helps is we now build everything with
C++ compiler, not C. So libstdc++ or (libc++) dependency doesn't need
to be added at all, even if libtool for some reason fails to handle
it.
This fixes issue #1371
From time to time things file inside tcmalloc guts where calling to
malloc is not safe. Regular strerror does locale bits, so will
occasionally open files/malloc/etc. We avoid this by using our own
"safe" variant that hardcodes names of all POSIX errno constants.
We had plenty of old and mostly no more correct i386 cruft. Now that
generic_fp backtracer covers i386 just fine, we can drop explicit x86
backtracer.
With that we refactored and simplified stacktrace.cc mostly around
picking default implementation, but also adding few more minor
cleanups.
The idea is -momit-leaf-frame-pointer gives us performance pretty much
same as fully omitting frame pointers. And having frame pointers
elsewhere allows us to support cases when user's code is built with
frame pointers. We also pass all tests with
TCMALLOC_STACKTRACE_METHOD=generic_fp (not just libunwind or libgcc).
Previously we allowed test programs to be linked at the same time as
weakening is performed, rewriting the .a archives. So lets be more
explicit. We weaken after all-am (which "runs" everything including
libraries and programs), but before all target.
It is kinda minor feature, and apparently we never had it working. But
is a nice to have. Allows our users to override malloc/free/etc while
still being able to link to us (for tc_malloc for example). With
broken weakening we had this use-case broken for static library
case. And it should now work.
In most practical terms, this expands "official" heap leak checker
support to Linux/arm64 and Linux/riscv (mips-en and legacy arm are
likely to work & pass tests too now).
The code is now explicitly Linux-only, without trying to pretend
otherwise. Main goal of this change is to finally amputate
linux_syscall_support.h, which we historically had trouble maintaining
well. Biggest challenge was around thread listing facility which uses
clone (ptrace explicitly fails between threads) and that causes
difficulties around parent and child tasks sharing
errno. linux_syscall_support stuff had special feature to "redirect"
errno accesses. But it caused us for more trouble. We switched to
regular syscalls, and errno stamping avoidance is now simply via
careful programming.
A number of other cleanups is made (such us thread finding codes in
procfs which clearly was built for some ages old and odd kernels).
sem_post/sem_wait synchronization was previously potentially prone to
deadlock (if parent died at bad time). We now use pipe pair for this
synchronization and it is fully robust.
We previously tested wrong assumption that larger than page size size
classes have addresses aligned on page size. New code is making proper
check of size class.
Also added is unit test coverage for this previously failing
condition. And we now also run "assert-ful" unittests for big tcmalloc
too, not only tcmalloc_minimal configuration.
This fixes github issue #1254
This supports frame pointer backtracing on x86-64, aarch64 and
riscv-s (should work for both 32 and 64 bits).
Also added is detection of borked libunwind on aarch64-s. In this case
frame pointer unwinder is preferred.
Previously it only was respected on x86_64, but this days lots of modern
ABIs are without frame pointers by default (e.g. arm64 and riscv, and
even older mips).
Clang mostly ignores those anyways, so our tests needed better way to
disable optimizations (clang is quite aggressive replacing new/delete
pair with stack allocation).
Everyone should be using golang pprof from github.com/google/pprof, but
distros still ship our perl version and not everyone is aware of
better pprof yet.
This is another step in completely dropping perl pprof. We still
default to installing it, but hopefully we'll be able to convince
distros to disable this soon.
We still install pprof under pprof-symbolize name because
stack traces symbolization depends on it, and because golang pprof
won't support this feature.
This is related to issue #1038.
1.Remove superfluous per file settings for include directory and runtime library.
2.Remove unnecessary project tcmalloc_minimal_unittest-static. We can simply build libtcmalloc_minimal as a static library and then link against the single .lib file.
3.Add separate configurations of patching and overriding facility for release mode.
Only add -static to malloc_bench_LDFLAGS and binary_trees_LDFLAGS if
ENABLE_STATC is set otherwise build with some compilers will fail if
user has decided to build only the shared version of gperftools
libraries
Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
- Add auto-detection of std::align_val_t presence to configure scripts. This
indicates that the compiler supports C++17 operator new/delete overloads
for overaligned types.
- Add auto-detection of -faligned-new compiler option that appeared in gcc 7.
The option allows the compiler to generate calls to the new operators. It is
needed for tests.
- Added overrides for the new operators. The overrides are enabled if the
support for std::align_val_t has been detected. The implementation is mostly
based on the infrastructure used by memalign, which had to be extended to
support being used by C++ operators in addition to C functions. In particular,
the debug version of the library has to distinguish memory allocated by
memalign from that by operator new. The current implementation of sized
overaligned delete operators do not make use of the supplied size argument
except for the debug allocator because it is difficult to calculate the exact
allocation size that was used to allocate memory with alignment. This can be
done in the future.
- Removed forward declaration of std::nothrow_t. This was not portable as
the standard library is not required to provide nothrow_t directly in
namespace std (it could use e.g. an inline namespace within std). The <new>
header needs to be included for std::align_val_t anyway.
- Fixed operator delete[] implementation in libc_override_redefine.h.
- Moved TC_ALIAS definition to the beginning of the file in tcmalloc.cc so that
the macro is defined before its first use in nallocx.
- Added tests to verify the added operators.
[alkondratenko@gmail.com: fixed couple minor warnings, and some
whitespace change]
[alkondratenko@gmail.com: removed addition of TC_ALIAS in debug allocator]
Signed-off-by: Aliaksey Kandratsenka <alkondratenko@gmail.com>
Without this patch, any user program that enables LeakSanitizer will
see a leak from tcmalloc. Add a weak hook to __lsan_ignore_object,
so that if LeakSanitizer is enabled, the allocation can be ignored.
It looks like, in past it could produce better code. But since
unwinding is totally different since almost forever now, there is no
perfomance benefit of it anymore.
I.e. because otherwise, when --enable-minimal is given, we're building
empty libtcmalloc.la and linking it to malloc_bench_shared_full. Which
has no effect at all and actually breaks builds on OSX.
Should fix issue #869.
nallocx is extension introduced by jemalloc. It returns effective size
of allocaiton without allocating anything.
We also support MALLOCX_LG_ALIGN flag. But all other jemalloc
flags (which at the moment do nothing for nallocx anyways) are
silently ignored, since there is no sensible way to return errors in
this API.
This was originally contributed by Dmitry Vyukov with input from
Andrew Hunter. But due to significant divergence of Google-internal
and free-software forks of tcmalloc, significant massaging was done by
me. So all bugs are mine.
Emergency malloc is enabled for cases when backtrace capturing needs to
call malloc. In this case, we enable emergency malloc just prior to
calling such code and disable it after it is done.
Particularly _Unwind_Backtrace which seems to be gcc extension.
This is what glibc's backtrace is commonly is using.
Using _Unwind_Backtrace directly is better than glibc's backtrace, since
it doesn't call into dlopen. While glibc does dlopen when it is built as
shared library apparently to avoid link-time dependency on libgcc_s.so
This also makes them output nicer results. I.e. every benchmark is run 3
times and iteration duration is printed for every run.
While this is still very synthetic and unrepresentave of malloc performance
as a whole, it is exercising more situations in tcmalloc fastpath. So it a
step forward.
Spinlock usage of cycle counter is due do tracking of time it's spent
waiting for lock. But this tracking is only useful we actually have
synchronization profiling working, which dont have. Thus I'm dropping
calls to this facility with eye towards further removal of cycle clock
usage.
So that LD_PRELOAD-ing doesn't force loading libpthread.so which may
slow down some single-threaded apps.
tcmalloc already has maybe_threads facility that can detect if
libpthread.so is loaded (via weak symbols) and provide 'simulations' of
some pthread functions that tcmalloc needs.
While this is not good representation of real-world production malloc
behavior, it is representative of length (instruction-wise and well as
cycle-wise) of fast-path. So this is better than nothing.
Default mode of operation of cpu profiler uses itimer and
SIGPROF. This timer is by definition per-process and no spec defines
which thread is going to receive SIGPROF. And it provides correct
profiles only if we assume that probability of picking threads will be
proportional to cpu time spent by threads.
It is easy to see, that recent Linux (at least on common SMP hardware)
doesn't satisfy that assumption. Quite big skews of SIGPROF ticks
between threads is visible. I.e. I could see as big as 70%/20%
division instead of 50%/50% for pair of cpu-hog threads. (And I do see
it become 50/50 with new mode)
Fortunately POSIX provides mechanism to track per-thread cpu time via
posix timers facility. And even more fortunately, Linux also provides
mechanism to deliver timer ticks to specific threads.
Interestingly, it looks like FreeBSD also has very similar facility
and seems to suffer from same skew. But due to difference in a way
how threads are identified, I haven't bothered to try to support this
mode on FreeBSD.
This commit implements new profiling mode where every thread creates
posix timer which tracks thread's cpu time. Threads also also set up
signal delivery to itself on overflows of that timer.
This new mode requires every thread to be registered in cpu
profiler. Existing ProfilerRegisterThread function is used for that.
Because registering threads requires application support (or suitable
LD_PRELOAD-able wrapper for thread creation API), new mode is off by
default. And it has to be manually activated by setting environment
variable CPUPROFILE_PER_THREAD_TIMERS.
New mode also requires librt symbols to be available. Which we do not
link to due to librt's dependency on libpthread. Which we avoid due
to perf impact of bringing in libpthread to otherwise single-threaded
programs. So it has to be either already loaded by profiling program
or LD_PRELOAD-ed.
Because clang doesn't understand -fno-builtin-malloc and friends. And
otherwise new/delete pairs get optimized away causing our tests that
expect hooks to be called to fail.