Commit Graph

718 Commits

Author SHA1 Message Date
barracuda156
e9a2d3c46f mmap_hook.cc: use MAP_ANON when MAP_ANONYMOUS is not defined 2023-12-16 10:46:57 +08:00
Aliaksey Kandratsenka
85048430ac consolidate do_mallinfo{,2}
We had 2 nearly identical implementations. Thankfully C++ templates
facility lets us produce 2 different runtime functions (for different
type widths) without duplicating source.

Amend github issue #1414
2023-12-07 15:01:27 -05:00
Mateusz Jakub Fila
b8e75ae6fe Add mallinfo2 function 2023-12-07 14:10:51 +01:00
Aliaksey Kandratsenka
a9b734e3fa perform ucontext->pc variants testing in compile-time
As part of cpu profiler we're extracting current PC (program counter)
of out signal's ucontext. Different OS and hardware combinations have
different ways for that. We had a list of variants that we tested at
compile time and populated PC_FROM_UCONTEXT macro into config.h. It
caused duplication and occasional mismatches between our autoconf and
cmake bits.

So this commit changes testing to be compile-time. We remove
complexity from build system and add some to C++ source.

We use SFINAE to find which of those variants compile (and we silently
assume that 'compiles' implies 'works'; this is what config-time
testing did too). Occasionally we'll face situations where several
variants compile. And we couldn't handle this case in pure C++. So we
have a small Ruby program that generates chain of inheritance among
SFINAE-specialized class templates. This handles prioritization among
variants.

List of ucontext->pc extraction variants is mostly same. We dropped
super-obsolete (circa Linux kernel 2.0) arm variant. And NetBSD case
is now improved. We now use their nice architecture-independent macro
instead of x86-specific access.
2023-12-02 18:58:45 -05:00
Sergey Fedorov
68db54545e Minor fix-ups for PowerPC defines 2023-11-25 15:58:24 +08:00
Julian Schroeder
000af9a164 [stacktrace_generic_fp] clear aarch64 pointer auth bits
AARCH64 >= armv8.3-a supports pointer authentication. If this feature is
enabled it modifies the previously unused upper address bits in apointer.
The affected bits need to be cleared in order for stacktrace to work.

Signed-off-by: Aliaksey Kandratsenka <alkondratenko@gmail.com>
[alkondratenko@gmail.com: added succinct subject line]
2023-11-01 13:10:32 -04:00
Aliaksey Kandratsenka
d1a0cbe1bf [qnx] handle broken cfree and mallopt 2023-10-30 19:47:52 -04:00
Xiang.Lin
717bf724a5 Add heap profiler support for QNX 2023-10-30 19:30:37 -04:00
Aliaksey Kandratsenka
adf24f9962 stacktrace_unittest: add simple way to skip ucontext testing 2023-10-27 20:53:21 -04:00
Aliaksey Kandratsenka
4d1a9e9226 stacktrace_unittest: test all stacktrace capturing methods 2023-10-27 19:06:15 -04:00
Aliaksey Kandratsenka
96f4f07525 avoid unused variable warning in stacktrace_libunwind 2023-10-27 19:00:17 -04:00
Aliaksey Kandratsenka
db4eacc5d9 avoid runtime initialization of null stacktrace implementation
As we recently found out, initializing static struct fields or
variables with lambdas, sets up runtime initialization instead of
static initialization as we assumed. So lets avoid this too for null
stacktrace implementation.
2023-10-24 15:16:16 -04:00
Yikai Zhao
5ba86d37a3 update stacktrace_unittest to test overflow issue
Signed-off-by: Aliaksey Kandratsenka <alkondratenko@gmail.com>
[alkondratenko@gmail.com: squashed log update commit here]
2023-10-19 21:38:01 -04:00
Yikai Zhao
dad9e8ceb9 Fix result overflow in generic_fp stacktrace
In the 'with ucontext' case, the `skip_count` would be reset to 0, and
`max_depth` should not be modified. Otherwise the result array would overflow.
2023-10-19 21:37:45 -04:00
Romain Geissler
c48d4f1407 Avoid initilizing CheckAddress with a lambda, so that it also works with gcc 6. 2023-10-19 14:35:19 -04:00
Aliaksey Kandratsenka
d48bf6b3ad force inline do_mmap_with_hooks
Otherwise mmap calling to do_mmap_with_hooks might tail-call (instead
of inlining), which will then break GetCallerStackTrace
facility (since only mmap is placed into special malloc_hook section).

This unbreaks heap checker on gcc 5, but is in general right thing to
do.
2023-09-27 22:04:31 -04:00
Aliaksey Kandratsenka
9a123db7b4 work around tuple construction miscompilation on gcc 5
This fixes github issue #1432
2023-09-27 22:04:31 -04:00
Yikai Zhao
d152d76cd1 generic_fp stacktrace: check frame size threshold for initial frame 2023-09-25 19:23:10 +08:00
Lennox Ho
589d416977 Add a more performant SpinLockDelay implementation for Windows based on WaitOnAddress and friends 2023-09-19 14:53:16 +08:00
Lennox Ho
17f23e8d1e Add the ability to disable TCMalloc replacement on Windows via environment variable TCMALLOC_DISABLE_REPLACEMENT=1 2023-09-18 16:02:07 -04:00
Lennox Ho
df006e880e Also expose SetMemoryReleaseRate and GetMemoryReleaseRate as C shims 2023-09-17 07:59:40 +08:00
Aliaksey Kandratsenka
dffb4a2f28 bump version to 2.13 2023-09-11 16:23:40 -04:00
Aliaksey Kandratsenka
4ec8c9dbb2 reduce set of nallocx size testing points
Testing every 7th size is a bit slow on slower machines. No need to be
as thorough. We now bump by about 1/128th each step which is still
more steps than size classes we have.
2023-09-10 22:18:51 -04:00
Aliaksey Kandratsenka
e4e7ba93a0 unbreak unnecessary dependency on 64-bit atomics
This unbreaks builds on 32-bit arms and mipsen.
2023-09-10 21:07:28 -04:00
Aliaksey Kandratsenka
2748dd5680 unbreak address access "probing" for generic_fp backtracing
We used msync to verify that address is readable. But msync gives
false positives for PROT_NONE mappings. And we recently got bug report
from user hitting this exact condition.

For correct access check, we steal idea from Abseil and do sigprocmask
with address used as new signal mask and with invalid HOW
argument. This works in today's Linux kernels and is among fastest
methods available. But is brittle w.r.t. possible kernel changes. So
we supply fallback method that does 2 syscalls.

For non-Linux systems we implement usual "write to pipe" trick. Which
also has decent performance, but requires occasional pipe draining and
uses fds which could occasionally be damaged by some forking codes.

We also finally cover all new code with unit test.

Fixes github issue #1426
2023-09-10 17:24:32 -04:00
Ivan Dlugos
7ad1dc7693 fix: cmake config.h defines declaration 2023-09-08 14:38:21 -04:00
Aliaksey Kandratsenka
f7172839a1 turn tcmalloc::TrivialOnce into POD
As we see in github issue #1428, msvc arranges full "init on first
use" initialization for local static usage of TrivialOnce even if that
initialization is completely empty. Fair game, even if stupid.

POD with no initialization should be safely zero-initialized with no
games or tricks from the compilers.

We could have and perhaps at some point should do constexpr for
TrivialOnce and SpinLock (abseil has been liberated from
LinkerInitialized for perphaps well over decade now, including their
fork of SpinLock, of course). But C++ legalese rules are complex
enough and bugs happened in past, so I don't want to be in the tough
business of interpreting standard. So at least for now we keep
things simple.
2023-09-08 14:22:46 -04:00
Aliaksey Kandratsenka
539ed9ca40 bump version 2.12 2023-08-24 15:03:47 -04:00
Aliaksey Kandratsenka
0a3ca5b43d bump version to 2.11 2023-08-14 22:47:56 -04:00
Ken Raffenetti
c41eb9e8b5 Add MPICH HPC environment detection
Default MPICH builds use the Hydra process manager (mpiexec) which sets
PMI_RANK in the application environment. Update GetUniquePathFromEnv()
test accordingly.

Signed-off-by: Ken Raffenetti <raffenet@mcs.anl.gov>
2023-08-11 15:21:15 -04:00
Aliaksey Kandratsenka
1d2654f3a0 heap-checker: unbreak PTRACE_GETREGS detection on older Linux-es
This unbreaks RHEL6.
2023-08-10 14:30:27 -04:00
Aliaksey Kandratsenka
729383b486 make sure that ListerThread runs on properly aligned stack
Without this fix we're failing unit tests on ubuntu 18.04 and centos 7
and 6. It looks like clone() in old glibc-s doesn't align stack, so
lets handle it ourselves. How we didn't hit this much earlier (before
massive thread listing refactoring), I am not sure. Most likely pure
luck(?)
2023-08-09 23:42:56 -04:00
Aliaksey Kandratsenka
51c5e2bec7 massage latest GetUniquePathFromEnv changes
This fixes a number of minor bits (like build details) as well as
making overall code style similar to what we're doing elsewhere.
2023-08-09 16:29:13 -04:00
Artem Polyakov
86450ad99f Add unit test for GetUniquePathFromEnv()
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2023-08-08 16:44:18 -07:00
Artem Y. Polyakov
881b754da0 Advanced UniquePathFromEnv generation
* Add support for known HPC environments (TODO: needs to be extended
with more nevironments)
* Added the "CPUPROFILE_USE_PID" environment variable to force appending
PID for the non-covered environments
* Preserve the old way of handling the Child-Parent case

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2023-08-08 16:44:18 -07:00
Aliaksey Kandratsenka
57512e9c3d unbreak -Wthread-safety
It actually found real (but arguably minor) issue with memory region
map locking.

As part of that we're replacing PageHeap::DeleteAndUnlock that had
somewhat ambitious 'move' of SpinLockHolder, with more straightforward
PageHeap::PrepareAndDelete. Doesn't look like we can support move
thingy with thread annotations.
2023-08-06 19:32:32 -04:00
Aliaksey Kandratsenka
dc25c1fd4c bump version to 2.11rc 2023-07-31 20:11:55 -04:00
Aliaksey Kandratsenka
909fa3e649 unbreak MallocExtension::GetAllocatedSize() for debug allocator
Some years back we fixed memalign vs realloc bug, but we preserved
'wrong' malloc_size/GetAllocatedSize implementation for debug
allocator.

This commit refactors old code making sure we always use right
data_size and it fixes GetAllocatedSize. We update our unittest
accordingly.

Closes #738
2023-07-31 16:28:48 -04:00
Aliaksey Kandratsenka
e3de2e3242 remove obsolete references to code.google.com
I.e. somehow we managed to still point to (very) old gperftools
hosting location, so lets fix it at last.
2023-07-31 14:28:58 -04:00
Aliaksey Kandratsenka
8b3f0d6145 undo MarkThreadTemporarilyIdle and make it same as MarkThreadIdle
As noted on github issue #880 'temporarily' thing saves us not just on
freeing thread cache, but also returning thread's share of thread
cache (max_size_) into common pool. And the later has caused trouble
to mongo folk who originally proposed 'temporarily' thing. They claim
they don't use it anymore.

And thus with no users and no clear benefit, it makes no sense for us
to keep this API. For API and ABI compat sake we keep it, but it is
now identical to regular MarkThreadIdle.

Fixes issue #880
2023-07-31 14:11:51 -04:00
Aliaksey Kandratsenka
c3059a56be dont workaround unknown problem in thread_dealloc_unittest
We had some sleep added at the end up thread dealloc unittest claiming
some race trouble with glibc. Which is likely years or even decades
irrelevant.
2023-07-31 12:37:15 -04:00
Aliaksey Kandratsenka
9b91ce917a [win32:patching] define single empty __expand replacement
This unbreaks some cases where patching complains about too short
functions to patch.

What happens is we first locate one of CRT-s (like ucrt or msvcrt) and
patch __expand there, redirecting to our implementation. Then "static"
__expand replacement is patched, but it is usually imported from that
same C runtime DLL. And through several jmp redirections we end up at
our own __expand from libc<1>. Patching that (and other cases) is
wrong, but I am unsure how to fix it properly. So we do most simple
workaround. I found that when it not fails is either in debug builds
where empty expand is not too short or when MSVC deduplicates multiple
identical __expand implementations into single function, or when
64-bit patching has to do extra trampoline thingy. And then our
patching code checks if we're trying to replace some function with
itself. So we "just" take advantage of that and get immediate issue
fixed, while punting on more general "duplicate" patching for later.

Update github issue #667
2023-07-27 19:38:52 -04:00
Aliaksey Kandratsenka
a5cfd38884 [win32] amend and unbreak previous NOMINMAX fix 2023-07-27 19:27:20 -04:00
Aliaksey Kandratsenka
d2c89ba534 don't return raw span when sampling and stacktrace oomed
This is nearly impossible in practice, but still. Somehow we missed
this logic that DoSampledAllocation always returns actual object, but
in that condition where stacktrace_allocator failed to get us
StackTrace object we ended up returning span instead of it's object.
2023-07-24 21:01:35 -04:00
Aliaksey Kandratsenka
59464404d1 capture growthz backtraces outside of pageheap_lock
Actual growthz list is now lockless since we never delete anything
from it. And we now pass special 'locking context' object down page
heap allocation path, both as a documentation that it is under lock
and for tracking whether we needed to grow heap and by how much. Then
whenever lock is released in RAII fashion, we're able to trigger
growthz recording outside of lock.

Closes #1159
2023-07-24 21:01:35 -04:00
Aliaksey Kandratsenka
0d42a48699 move page heap locking under PageHeap
While there is still plenty of code that takes pageheap_lock outside
of page_heap module for all kinds of reasons, at least
bread-and-butter logic of allocating/deallocating larger chunks of
memory is now handling page heap locking inside PageHeap itself. This
gives us flexibility.

Update issue #1159
2023-07-24 21:01:35 -04:00
Aliaksey Kandratsenka
a3e1080c2e handle large alloc reporting locklessly
Which simplifies codes a bit.

Update issue #1159
2023-07-24 21:01:35 -04:00
Aliaksey Kandratsenka
f1eb3c82c6 correctly release memory when system's pagesize is >kPageSize
I.e. this covers case of arms that by default compile tcmalloc for 8k
logical pages (assuming 4k system pages), but can actually run on
systems with 64k pages.

Closes #1135
2023-07-24 21:01:35 -04:00
Aliaksey Kandratsenka
d521e3b30e move page heap allocations with alignment into page heap 2023-07-24 21:01:35 -04:00
Aliaksey Kandratsenka
ad0ca2b83b unbreak large heap fragmentation unittest
Smart compilers again (and lack of -fno-builtin-malloc which we
dropped because of clang).
2023-07-24 20:24:52 -04:00