So that malloc hooks can call into
MallocExtension::instance()->GetAllocatedSize() and avoid recursion
from new MallocExtension call inside that call.
This was proposed by github user poljak181 at issue #1472
We had 2 nearly identical implementations. Thankfully C++ templates
facility lets us produce 2 different runtime functions (for different
type widths) without duplicating source.
Amend github issue #1414
As part of cpu profiler we're extracting current PC (program counter)
of out signal's ucontext. Different OS and hardware combinations have
different ways for that. We had a list of variants that we tested at
compile time and populated PC_FROM_UCONTEXT macro into config.h. It
caused duplication and occasional mismatches between our autoconf and
cmake bits.
So this commit changes testing to be compile-time. We remove
complexity from build system and add some to C++ source.
We use SFINAE to find which of those variants compile (and we silently
assume that 'compiles' implies 'works'; this is what config-time
testing did too). Occasionally we'll face situations where several
variants compile. And we couldn't handle this case in pure C++. So we
have a small Ruby program that generates chain of inheritance among
SFINAE-specialized class templates. This handles prioritization among
variants.
List of ucontext->pc extraction variants is mostly same. We dropped
super-obsolete (circa Linux kernel 2.0) arm variant. And NetBSD case
is now improved. We now use their nice architecture-independent macro
instead of x86-specific access.
AARCH64 >= armv8.3-a supports pointer authentication. If this feature is
enabled it modifies the previously unused upper address bits in apointer.
The affected bits need to be cleared in order for stacktrace to work.
Signed-off-by: Aliaksey Kandratsenka <alkondratenko@gmail.com>
[alkondratenko@gmail.com: added succinct subject line]
As we recently found out, initializing static struct fields or
variables with lambdas, sets up runtime initialization instead of
static initialization as we assumed. So lets avoid this too for null
stacktrace implementation.
Otherwise mmap calling to do_mmap_with_hooks might tail-call (instead
of inlining), which will then break GetCallerStackTrace
facility (since only mmap is placed into special malloc_hook section).
This unbreaks heap checker on gcc 5, but is in general right thing to
do.
Testing every 7th size is a bit slow on slower machines. No need to be
as thorough. We now bump by about 1/128th each step which is still
more steps than size classes we have.
We do shell wrapper for actual test run, so we can inspect output of
pprof. But when we set up sampling_debug_test.sh we simply copied
regular sampling_test.sh, which ran same non-debug test binary. Now we
sed-replace contents of shell program when copying, so we test right
binary.
Another thing we fix here is our (still hardcoded) test output path is
now different between sampling{,_debug}_test.sh. So this fixes main
cause of flakiness of our unit tests.
We used msync to verify that address is readable. But msync gives
false positives for PROT_NONE mappings. And we recently got bug report
from user hitting this exact condition.
For correct access check, we steal idea from Abseil and do sigprocmask
with address used as new signal mask and with invalid HOW
argument. This works in today's Linux kernels and is among fastest
methods available. But is brittle w.r.t. possible kernel changes. So
we supply fallback method that does 2 syscalls.
For non-Linux systems we implement usual "write to pipe" trick. Which
also has decent performance, but requires occasional pipe draining and
uses fds which could occasionally be damaged by some forking codes.
We also finally cover all new code with unit test.
Fixes github issue #1426