Testing every 7th size is a bit slow on slower machines. No need to be
as thorough. We now bump by about 1/128th each step which is still
more steps than size classes we have.
We do shell wrapper for actual test run, so we can inspect output of
pprof. But when we set up sampling_debug_test.sh we simply copied
regular sampling_test.sh, which ran same non-debug test binary. Now we
sed-replace contents of shell program when copying, so we test right
binary.
Another thing we fix here is our (still hardcoded) test output path is
now different between sampling{,_debug}_test.sh. So this fixes main
cause of flakiness of our unit tests.
We used msync to verify that address is readable. But msync gives
false positives for PROT_NONE mappings. And we recently got bug report
from user hitting this exact condition.
For correct access check, we steal idea from Abseil and do sigprocmask
with address used as new signal mask and with invalid HOW
argument. This works in today's Linux kernels and is among fastest
methods available. But is brittle w.r.t. possible kernel changes. So
we supply fallback method that does 2 syscalls.
For non-Linux systems we implement usual "write to pipe" trick. Which
also has decent performance, but requires occasional pipe draining and
uses fds which could occasionally be damaged by some forking codes.
We also finally cover all new code with unit test.
Fixes github issue #1426
As we see in github issue #1428, msvc arranges full "init on first
use" initialization for local static usage of TrivialOnce even if that
initialization is completely empty. Fair game, even if stupid.
POD with no initialization should be safely zero-initialized with no
games or tricks from the compilers.
We could have and perhaps at some point should do constexpr for
TrivialOnce and SpinLock (abseil has been liberated from
LinkerInitialized for perphaps well over decade now, including their
fork of SpinLock, of course). But C++ legalese rules are complex
enough and bugs happened in past, so I don't want to be in the tough
business of interpreting standard. So at least for now we keep
things simple.
Default MPICH builds use the Hydra process manager (mpiexec) which sets
PMI_RANK in the application environment. Update GetUniquePathFromEnv()
test accordingly.
Signed-off-by: Ken Raffenetti <raffenet@mcs.anl.gov>
This unbreaks building on older Linux distros. We missed this at
46d3315ad7 when dropped maybe_thread
stuff, since libprofiler indeed uses pthread, and because on newer
libc-s pthread stuff is now part of regular libc.so.
I am also dropping bogus LIBPROFILER stuff referring to some rpath
badness. Unsure what it was, maybe way back we did libstacktrace as a
proper libtool library, so maybe something was needed. But it is just
a convenience archive this days, so we don't really need to add it
everywhere libprofiler.la is linked.
Without this fix we're failing unit tests on ubuntu 18.04 and centos 7
and 6. It looks like clone() in old glibc-s doesn't align stack, so
lets handle it ourselves. How we didn't hit this much earlier (before
massive thread listing refactoring), I am not sure. Most likely pure
luck(?)
* Add support for known HPC environments (TODO: needs to be extended
with more nevironments)
* Added the "CPUPROFILE_USE_PID" environment variable to force appending
PID for the non-covered environments
* Preserve the old way of handling the Child-Parent case
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
It actually found real (but arguably minor) issue with memory region
map locking.
As part of that we're replacing PageHeap::DeleteAndUnlock that had
somewhat ambitious 'move' of SpinLockHolder, with more straightforward
PageHeap::PrepareAndDelete. Doesn't look like we can support move
thingy with thread annotations.
Some years back we fixed memalign vs realloc bug, but we preserved
'wrong' malloc_size/GetAllocatedSize implementation for debug
allocator.
This commit refactors old code making sure we always use right
data_size and it fixes GetAllocatedSize. We update our unittest
accordingly.
Closes#738
As noted on github issue #880 'temporarily' thing saves us not just on
freeing thread cache, but also returning thread's share of thread
cache (max_size_) into common pool. And the later has caused trouble
to mongo folk who originally proposed 'temporarily' thing. They claim
they don't use it anymore.
And thus with no users and no clear benefit, it makes no sense for us
to keep this API. For API and ABI compat sake we keep it, but it is
now identical to regular MarkThreadIdle.
Fixes issue #880
This unbreaks some cases where patching complains about too short
functions to patch.
What happens is we first locate one of CRT-s (like ucrt or msvcrt) and
patch __expand there, redirecting to our implementation. Then "static"
__expand replacement is patched, but it is usually imported from that
same C runtime DLL. And through several jmp redirections we end up at
our own __expand from libc<1>. Patching that (and other cases) is
wrong, but I am unsure how to fix it properly. So we do most simple
workaround. I found that when it not fails is either in debug builds
where empty expand is not too short or when MSVC deduplicates multiple
identical __expand implementations into single function, or when
64-bit patching has to do extra trampoline thingy. And then our
patching code checks if we're trying to replace some function with
itself. So we "just" take advantage of that and get immediate issue
fixed, while punting on more general "duplicate" patching for later.
Update github issue #667
There was this piece of makefile with indention to add stack tracing
functionality (for stuff like growthz, GetCallerStackTrace and
probably heap sampling) to work even in minimal configuration on
mingw.
What is odd is we fail to actually define libstacktrace.la target on
mingw, since libstacktrace.la requires WITH_STACK_TRACE automake
conditional which we don't enable on this platform. And yet somehow it
doesn't fail. It produces empty libstacktrace.la, so build kinda
works. Except at least on my machine it produces racy makefiles. So
lets not pretend and stop breaking our parallel builds.
This is nearly impossible in practice, but still. Somehow we missed
this logic that DoSampledAllocation always returns actual object, but
in that condition where stacktrace_allocator failed to get us
StackTrace object we ended up returning span instead of it's object.
Actual growthz list is now lockless since we never delete anything
from it. And we now pass special 'locking context' object down page
heap allocation path, both as a documentation that it is under lock
and for tracking whether we needed to grow heap and by how much. Then
whenever lock is released in RAII fashion, we're able to trigger
growthz recording outside of lock.
Closes#1159
While there is still plenty of code that takes pageheap_lock outside
of page_heap module for all kinds of reasons, at least
bread-and-butter logic of allocating/deallocating larger chunks of
memory is now handling page heap locking inside PageHeap itself. This
gives us flexibility.
Update issue #1159
I.e. this covers case of arms that by default compile tcmalloc for 8k
logical pages (assuming 4k system pages), but can actually run on
systems with 64k pages.
Closes#1135
Previous implementation wasn't entirely safe w.r.t. 32-bit off_t
systems. Specifically around mmap replacement hook. Also, API was a
lot more general and broad than we actually need.
Sadly, old mmap hooks API was shipped with our public headers. But
thankfully it appears to be unused externally (checked via github
search). So we keep this old API and ABI for the sake of formal API
and ABI compatibility. But this old API is now empty and always
fails (some OS/hardware combinations didn't have functional
implementations of those hooks anyways).
New API is 64-bit clean and only provides us with what we need. Namely
being able to react to virtual address space mapping changes for
logging, heap profiling and heap leak checker. I.e. no pre hooks or
mmap-replacement hooks. We also explicitly not ship this API
externally to give us freedom to change it.
New code is also hopefully tidier and slightly more portable. At least
there are fewer arch-specific ifdef-s.
Another somewhat notable change is, since mmap hook isn't needed in
"minimal" configuration, we now don't override system's
mmap/munmap/etc functions in this configuration. No big deal, but it
reduces risk of damage if we somehow mess those up. I.e. musl's mmap
does few things that our mmap replacement doesn't, such as very fancy
vm_lock thingy. Which doesn't look critical, but is good thing for us
not to interfere with when not necessary.
Fixes issue #1406 and issue #1407. Lets also mention issue #1010 which
is somewhat relevant.