This allows us, later, to avoid building this stuff in configurations
that don't use it. I have also reduced API and ABI surface to enable
further refactorings.
* Remove build dependency on HAVE_PTHREAD
* Remove build dependency on HAVE_STD_ALIGNED_VAL_T and ENABLE_ALIGNED_NEW_DELETE
* Remove redundant tcmalloc.h files & ensure there are no cross-build-tool references
* Adopt automake commit 26927d1 in the CMake build
- Fix CMake builds for MinGW and MSVC
- Ensure the Autotools, CMake and VSProj builds do not reference each others' config.h
- Use std:🧵:id instead of our own thread ID wrappers
- Moved explicit TLS wrapper functions into the tcmalloc:: namespace and change their visibility to hidden
Resolves#1486
Pretty much everything has fork, except windows. Which is much simpler
to test at compile time, then duplicating and complicating the test at
configure time.
As part of cpu profiler we're extracting current PC (program counter)
of out signal's ucontext. Different OS and hardware combinations have
different ways for that. We had a list of variants that we tested at
compile time and populated PC_FROM_UCONTEXT macro into config.h. It
caused duplication and occasional mismatches between our autoconf and
cmake bits.
So this commit changes testing to be compile-time. We remove
complexity from build system and add some to C++ source.
We use SFINAE to find which of those variants compile (and we silently
assume that 'compiles' implies 'works'; this is what config-time
testing did too). Occasionally we'll face situations where several
variants compile. And we couldn't handle this case in pure C++. So we
have a small Ruby program that generates chain of inheritance among
SFINAE-specialized class templates. This handles prioritization among
variants.
List of ucontext->pc extraction variants is mostly same. We dropped
super-obsolete (circa Linux kernel 2.0) arm variant. And NetBSD case
is now improved. We now use their nice architecture-independent macro
instead of x86-specific access.
We used msync to verify that address is readable. But msync gives
false positives for PROT_NONE mappings. And we recently got bug report
from user hitting this exact condition.
For correct access check, we steal idea from Abseil and do sigprocmask
with address used as new signal mask and with invalid HOW
argument. This works in today's Linux kernels and is among fastest
methods available. But is brittle w.r.t. possible kernel changes. So
we supply fallback method that does 2 syscalls.
For non-Linux systems we implement usual "write to pipe" trick. Which
also has decent performance, but requires occasional pipe draining and
uses fds which could occasionally be damaged by some forking codes.
We also finally cover all new code with unit test.
Fixes github issue #1426
Previous implementation wasn't entirely safe w.r.t. 32-bit off_t
systems. Specifically around mmap replacement hook. Also, API was a
lot more general and broad than we actually need.
Sadly, old mmap hooks API was shipped with our public headers. But
thankfully it appears to be unused externally (checked via github
search). So we keep this old API and ABI for the sake of formal API
and ABI compatibility. But this old API is now empty and always
fails (some OS/hardware combinations didn't have functional
implementations of those hooks anyways).
New API is 64-bit clean and only provides us with what we need. Namely
being able to react to virtual address space mapping changes for
logging, heap profiling and heap leak checker. I.e. no pre hooks or
mmap-replacement hooks. We also explicitly not ship this API
externally to give us freedom to change it.
New code is also hopefully tidier and slightly more portable. At least
there are fewer arch-specific ifdef-s.
Another somewhat notable change is, since mmap hook isn't needed in
"minimal" configuration, we now don't override system's
mmap/munmap/etc functions in this configuration. No big deal, but it
reduces risk of damage if we somehow mess those up. I.e. musl's mmap
does few things that our mmap replacement doesn't, such as very fancy
vm_lock thingy. Which doesn't look critical, but is good thing for us
not to interfere with when not necessary.
Fixes issue #1406 and issue #1407. Lets also mention issue #1010 which
is somewhat relevant.
This facility allowed us to build tcmalloc without linking in actual
-lpthread. Via weak symbols we checked at runtime if pthread functions
are available and if not, special single-threaded stubs were used
instead. Not always brining in pthread dependency helped performance
of some programs or libraries which depended at runtime on whether
threads are linked or not. Most notable of those are libstdc++ which
uses non-atomic refcounting on single threaded programs.
But such optional dependency on pthreads caused complications for
nearly no benefit. One trouble was reported in github issue #1110.
This days glibc/libstdc++ combo actually depends on
sys/single_threaded.h facility. So bringing pthread at runtime is
fine. Also modern glibc ships pthread symbols inside libc anyways and
libpthread is empty. I also found that for whatever reason on BSDs and
osx we already pulled in proper pthreads too.
So we loose nothing and we get issue #1110 fixed. And we simplify
everything.
This fixes issue #1371
From time to time things file inside tcmalloc guts where calling to
malloc is not safe. Regular strerror does locale bits, so will
occasionally open files/malloc/etc. We avoid this by using our own
"safe" variant that hardcodes names of all POSIX errno constants.
We had plenty of old and mostly no more correct i386 cruft. Now that
generic_fp backtracer covers i386 just fine, we can drop explicit x86
backtracer.
With that we refactored and simplified stacktrace.cc mostly around
picking default implementation, but also adding few more minor
cleanups.
The idea is -momit-leaf-frame-pointer gives us performance pretty much
same as fully omitting frame pointers. And having frame pointers
elsewhere allows us to support cases when user's code is built with
frame pointers. We also pass all tests with
TCMALLOC_STACKTRACE_METHOD=generic_fp (not just libunwind or libgcc).
In most practical terms, this expands "official" heap leak checker
support to Linux/arm64 and Linux/riscv (mips-en and legacy arm are
likely to work & pass tests too now).
The code is now explicitly Linux-only, without trying to pretend
otherwise. Main goal of this change is to finally amputate
linux_syscall_support.h, which we historically had trouble maintaining
well. Biggest challenge was around thread listing facility which uses
clone (ptrace explicitly fails between threads) and that causes
difficulties around parent and child tasks sharing
errno. linux_syscall_support stuff had special feature to "redirect"
errno accesses. But it caused us for more trouble. We switched to
regular syscalls, and errno stamping avoidance is now simply via
careful programming.
A number of other cleanups is made (such us thread finding codes in
procfs which clearly was built for some ages old and odd kernels).
sem_post/sem_wait synchronization was previously potentially prone to
deadlock (if parent died at bad time). We now use pipe pair for this
synchronization and it is fully robust.
Previously we blindly tried to use libunwind whenever header is
detected. Even if actually working libunwind library is missing. This
is now fixed, so we attempt to use libunwind when it actually works.
Somehow recent freebsd ships libunwind.h (which seems to belong to
llvm's implementation), but apparently without matching .so. So then building
and linking failed.