When linking statically we may end up calling ProfilerGetCurrentState
earlier than profiler is initialized. And we segfaulted on that early
call. Lets make us handle this case gracefully.
Instead of MallocHook::GetCallerStackTrace.
Thing is, GetCallerStackTrace isn't reliable beyond ELF systems, like
OSX. And, yet, things just work without it for e.g. heap
sampling. Why? Because pprof already knows how to exclude
tcmalloc-internal stack frames (by looking at e.g. tcmalloc::
namespace). So we do the same for heap profiler.
This fixes heap profiling unit tests on OSX.
glibc carefully handles unwind info for signal trampoline stack frame,
so even brittle "skip N frames" case works there. But e.g. on musl it
doesn't.
So lets skip this test on non-glibc systemsfor now, until we test
things closer to how it is done by cpu profiler.
So I noticed that profiler_unittest failed somewhat regularly on
armhf. On inspection I found that it fails because the test compares
"nested-most" tick counts between several profiles and we had most
ticks inside inlined atomic ops functions and not test_main_thread.
On the other hand, removing atomic ops from nested loop makes test way
too fast for the modern quick x86 desktops.
So lets make it try harder to be non-brittle. We do that by grabbing
access to profiler's ticks count, which helps our inner loops to run
long enough to get sufficient ticks count.
We also do couple more minor updates.
*) we have shell script use random temp directory name. Which allows
us to exercise golang-stress and similar utils.
*) we drop PERFTOOLS_UNITTEST and relevant code because this
"variable" was always set to be true anyways.
The test inspects "nested-most" frames in malloc sampling samples. And
some compile options caused noopt to seen as call site of
malloc. I.e. because noopt is immediately after malloc.
To make things more robust we create local "copy" of noopt that is
marked as noinline and so immediately next instruction after malloc is
call to this local_noopt thingy, which makes test more robust since
malloc stack trace will see that, as we expect, AllocateAllocate is
what calls malloc.
Part of this change is better diagnostic for when/if it fails. And
most important part is compensating for the delay between sampling
parameter is set for the test and when it is actually taken into
account by thread cache Sampler logic.
As a result about 1% of flaking probability has been fixed as we're
getting mean estimate for the allocated size actually same (or about
same) as allocated size.
Update github ticket #1557.
Clang kinda rightfully noticed that if FLAGS_test_profiler_enabled is
false (and it never is false in practice), we'd be reading unitialized
variables. Since this flag is never updated, we can simply drop those
dead "false" paths.
Even though this specific usage is safe, it is, arguably, not great to
be using sprintf in year 2024. snprintf is portable enough this
days. Thanks goes to OSX headers that warn about sprintf loudly enough
to be noticed.
OSX's malloc zones stuff is essentially incompatible with emergency
malloc. In order to handle it, our unit tests use tc_free directly in
order to free memory allocated by emergency malloc. So lets keep doing
it for our newer code that exercises calloc/free pair as well.
Originally at Google they had to do 2 subsections for hookable
functions because of some linking detais (in bazel infrastructure they
still do different libraries as .so-s for tests). So "generic" hooks
(such as mmap/sbrk) were in malloc_hook section and tcmalloc's
were/are in google_malloc. Since those are different bazel/blaze
libraries. And we kept this distinction simply because no-one bothered
to undo it, despite us never needing it.
We recently refactored mmap/sbrk hooking. And we don't use section
stuff anymore for those hooks. And so there are no malloc_hook
anymore. And so we were getting bogus and useless warnings about empty
section. So lets avoid this.
This benchmark exercizes multi-threaded central free list operations,
and this is where we're losing to a bunch of competing malloc (i.e. which
shard heap).
We now support a set of command line flags similar to "abseil"
benchmark thingy. I.e. to let people specify a subset of benchmarks or
run them longer/shorter as needed.
This commit also includes small, portable and very simple regexp
facility. It isn't good enough for some production use, but it is
plenty good for some testing uses or benchmark selection.
Instead of relying on gperftools-specific tc_XYZ functions for sized
deallocation and memalign we use standard C++ facilities. There are
also other minor improvements like mallocing larger buffers rather
than statically allocating them.
1. This documents the intent that the way to create/destroy a ThreadCache
object is through the static methods NewHeap()/DeleteCache()
2. It makes using the class less error prone. The compiler will complain
if some new code is accidentally creating objects directly
3. This may allow some compilers to optimize code knowing that those
functions are private
Signed-off-by: Olivier Langlois <olivier@trillion01.com>
It's method of verifying that cpu profiling ticks happens is
inherently brittle. It sleeps certain time and then checks if ticks
happened during that time. And sleeping is by wall clock time. But due
to cpu scheduling being unpredictable, it is not impossible to see no
ticks even waiting 200 ms. We improve the issue by having the code
loop a bit until it seeks ticks happen.
Qnx pmap file starts with a header line, skip it to make ForEachLine
continue feeding next valid proc maps line.
vaddr,size,flags,prot,maxprot,dev,ino,offset,rsv,guardsize,refcnt,mapcnt,path
0x0000001c0ae92000,0x0000000000002000,0x00010071,0x05,0x0d,0x00000802,0x00000000c00000ab, \
0x0000000000000000,0x0000000000000000,0x00000000,0x0000017e,0x0000017c,/bin/cat
Apparently modern ucrt does have _msize_base and _recalloc_base
defined in same .obj files as their corresponsing "non-base"
functions. And they're seemingly used somewhere in their C or C++
runtime. So lets provide them too in our override_functions.cc, so
that overriding malloc routines in MSVC's static C runtime option
works.
Otherwise, whatever msize_base usage pulls msize.obj file that wants
to define regular msize, and with msize already provided by
override_function, we'd rightfully get linking error.
Update github ticket
https://github.com/gperftools/gperftools/issues/1430