This is part of effort to get rid of perl pprof dependency. We're
replacing forking to pprof --symbols with carefully crafted
libbacktrace integration which has enough support for symbolizing
backtraces.
This refactors API so that meta_data_arena isn't needed or used
anymore. We always used nullptr for this argument anyways.
We also change code so that PagesAllocator API is able to choose
rounding up of allocation requests. It now returns pointer to
allocated memory and it's rounded up size.
And most important part of this change is that Arena instance itself
is now allocated with provided allocator (when provided). This reduces
locking depencies.
We make sure to cover new logic by test.
Apparently, gcc 10 doesn't support trailing return types for lambdas with an attribute.
```
src/tests/sampling_test.cc: In lambda function:
src/tests/sampling_test.cc:70:56: error: expected '{' before '->' token
70 | auto local_noopt = [] (void* ptr) ATTRIBUTE_NOINLINE -> void* {
| ^~
src/tests/sampling_test.cc: In function 'void* AllocateAllocate()':
src/tests/sampling_test.cc:70:56: error: base operand of '->' has non-pointer type 'AllocateAllocate()::<lambda(void*)>'
src/tests/sampling_test.cc:70:59: error: expected unqualified-id before 'void'
70 | auto local_noopt = [] (void* ptr) ATTRIBUTE_NOINLINE -> void* {
| ^~~~
```
Remove the trailing return type as it is deduced from the `noopt` call anyway.
And unbreak on it FreeBSD.
Turns out not only they don't mount procfs by default, their
/proc/pid/map fails to supply mapping offsets. Without offsets all the
hope is lost.
I see rare occasional failures there, which look very odd. Typically
both sampling and sampling_debug tests fail at about the same time. So
hopefully we can diagnose them sometime.
We used GetCallerStackTrace thingy before, but it is not entirely
reliable in it's detection of malloc stack frames (i.e. on OSX). So
lets do full thing instead. Those stacktraces are to be printed to
users anyways.
When linking statically we may end up calling ProfilerGetCurrentState
earlier than profiler is initialized. And we segfaulted on that early
call. Lets make us handle this case gracefully.
Instead of MallocHook::GetCallerStackTrace.
Thing is, GetCallerStackTrace isn't reliable beyond ELF systems, like
OSX. And, yet, things just work without it for e.g. heap
sampling. Why? Because pprof already knows how to exclude
tcmalloc-internal stack frames (by looking at e.g. tcmalloc::
namespace). So we do the same for heap profiler.
This fixes heap profiling unit tests on OSX.
glibc carefully handles unwind info for signal trampoline stack frame,
so even brittle "skip N frames" case works there. But e.g. on musl it
doesn't.
So lets skip this test on non-glibc systemsfor now, until we test
things closer to how it is done by cpu profiler.
So I noticed that profiler_unittest failed somewhat regularly on
armhf. On inspection I found that it fails because the test compares
"nested-most" tick counts between several profiles and we had most
ticks inside inlined atomic ops functions and not test_main_thread.
On the other hand, removing atomic ops from nested loop makes test way
too fast for the modern quick x86 desktops.
So lets make it try harder to be non-brittle. We do that by grabbing
access to profiler's ticks count, which helps our inner loops to run
long enough to get sufficient ticks count.
We also do couple more minor updates.
*) we have shell script use random temp directory name. Which allows
us to exercise golang-stress and similar utils.
*) we drop PERFTOOLS_UNITTEST and relevant code because this
"variable" was always set to be true anyways.
The test inspects "nested-most" frames in malloc sampling samples. And
some compile options caused noopt to seen as call site of
malloc. I.e. because noopt is immediately after malloc.
To make things more robust we create local "copy" of noopt that is
marked as noinline and so immediately next instruction after malloc is
call to this local_noopt thingy, which makes test more robust since
malloc stack trace will see that, as we expect, AllocateAllocate is
what calls malloc.
Part of this change is better diagnostic for when/if it fails. And
most important part is compensating for the delay between sampling
parameter is set for the test and when it is actually taken into
account by thread cache Sampler logic.
As a result about 1% of flaking probability has been fixed as we're
getting mean estimate for the allocated size actually same (or about
same) as allocated size.
Update github ticket #1557.