This reverts commit a0880d78f3.
We had to revert it, because some most recent OSX versions had broken
exceptions for unwinding through functions with ATTRIBUTE_SECTION. But
now that ATTRIBUTE_SECTION is dropped on OSX, we can recover it.
Same as before, operator new/delete integration is much faster on
OSX. Unlike malloc/free and similar APIs which are ~fundamentally
broken performance-wise on OSX due to expensive arenas integration, we
don't need to support this overhead for C++ primitives. So lets get
this win at last.
This support is only strictly necessary for
MallocHook_GetCallerStackTrace which is only needed for heap
checker. And since heap checker is Linux-only, lets just amputate this
code.
Solaris has somewhat nice behavior of mmaps where high bits of address
are set. It still uses only 48 bits of address on x86 (and,
presumably, 64-bit arm), just with somewhat non-standard way. But this
behavior causes some inconveniences to us. In particular, we had to
disable mmap sys allocator and had failing emergency malloc tests (due
to assertion in TryGetSizeClass in CheckCachedSizeClass). We could
consider more comprehensive fix, but lets just do "honest" 64-bit
addresses at least for now.
Allow debugallocation death test to accept RUN_ALL_TESTS in backtrace
instead of main. For some reason solaris ends up either optimizing
tail-calling of RUN_ALL_TESTS or something else happens in gtest
platform integration, but failure backtraces don't have main
there. But the intention of the test is simply to ensure that we got
failure with backtrace. So lets accept RUN_ALL_TESTS as well.
Instead of relying on __sbrk (on subset of Linux systems) or invoking
sbrk syscall directly (on subset of FreeBSD systems), we have malloc
invoke special tcmalloc_hooked_sbrk function. Which handles hooking
and then invokes regular system's sbrk. Yes, we loose theoretical
ability to hook into non-tcmalloc uses of sbrk, but we gain portable
simplicity.
The case of mmap and sbrk hooks is simple enough that we can do
simpler "skip right number of frames" approach. Instead of relying on
less portable and brittle attribute section trick.
The comments in this file stated that it has to be linked in specific
order to get initialized early enough. But our initialization is
de-facto via initial malloc hook. And to deal with latest-possible
destruction, we use more convenient destructor function
attribute, and make things simpler.
When building with -O0 -fno-inlines, +[] () {} (lambda) syntax for
function pointers actually creates "wrapper function" so we see extra
frame (due to disabled inlinings). Fix is to create explicit function
and pass it, instead of lambda thingy.
I broke it with "simple" off by one error in big emergency malloc
refactoring change.
It is somewhat shocking that no test caught this, but we'll soon be
adding such test.
References github issue #1503.
This significantly reworks both early thread cache access and related
emergency malloc mode checking integration. As a result, we're able to
to rely on emergency malloc even on systems without "good"
TLS (e.g. QNX which does emutls).
One big change here is we're undoing early change to have single
"global" thread cache early during process lifetime. It was nice and
somewhat simpler approach. But because of inherent locking during
early process lifetime, we couldn't be sure of certain lock ordering
aspects w.r.t. backtracing/exception-stack-unwinding. So I choose to
keep it safe. So the new idea is we use SlowTLS facility to find
threads' caches when normal tls isn't ready yet. It avoids holding
locks around potentially recursion-ful things (like
ThreadCache::ModuleInit or growing heap). But we then have to be
careful to "re-attach" those early thread cache instances to regular
TLS. There will nearly always be just one of such thread caches. For
initial thread. But we cannot entirely rule out more general case
where someone creates threads before process initializers ran and
main() is reached. Another notable thing is free-ing memory in this
early mode will always using slow-path deletion directly into central
free list.
SlowTLS facility is "simply" a generalization of previous
CreateBadTLSCache code. I.e. we have a small fixed-size cache that
holds "exceptional" instances of thread-identitity to
thread-cache+emergency-mode-flag mappings.
We also take advantage of tcmalloc::kInvalidTLSKey we introduced
earlier and remove potentially raceful memory ordering between reading
tls_ready_ and tls_key_.
For emergency malloc detection we previously used thread_local
flag. Which we cannot use on !kHaveGoodTLS systems. So we instead
_remove_ thread's cache from it's normal tls storage and place it
"into" SlowTLS instead for the duration of WithStacktraceScope
call (which is how emergency mode is enabled now).
The intention is to initialize tls-key variable with this invalid
value. This will help us avoid separate "tls ready" flag and possible
memory ordering issues around distinct tls key and tls-ready
variables.
On windows we use TLS_OUT_OF_INDEXES values which is properly
"impossible" tls index value. On POSIX systems we add theoretically
unportable, but practically portable assumption that tls keys are
integers. And we make value of -1 be that invalid key.
We use __builtin_trap (which compiles to explicitly undefined
instruction or "int 3" on x64-en), when available, to make those
crashing Log invokations a little nicer to debug.
This will not only make things right w.r.t. possible order of test
runs, but it also unbreaks test failures on windows (where gtest ends
up doing some malloc after test completion, hits the limit and dies).
Sadly, certain/many implementations of std::this_thread::id invoke
malloc. So we need something more robust. On Unix systems we use
address of errno as thread identifier. Sadly, this doesn't cover
windows where MS's C runtime facility will occasionally malloc when
errno location is grabbed (with some special trickery for when malloc
itself needs to set errno to ENOMEM!). So on windows, we do
GetCurrentThreadId which appears to be roughly as efficient as
"normal" system's __errno_location implementation.
We use mmap when we initialize it, which could via heap checker
recurse back into backtracing and check-address. So before we do mmap
and rest of initialization, we now set check-address implementation
to conservative two-syscalls version.
We had duplicate definition of flags_tmp variable.
Signed-off-by: Aliaksey Kandratsenka <alkondratenko@gmail.com>
[alkondratenko@gmail.com] updated commit message
Apparently some recent FreeBSDs occasionally lack brk. So our code
which previously hard-coded that this OS has brk (which we use to
implement hooked sbrk) fails to compile.
Our configure scripts already detects sbrk, so we simply need to pay
attention. Fixes github issue #1499
Our implementation of emergency malloc slow-path actually depends on
good TLS implementation. So on systems without good TLS (i.e. OSX), we
lied to ourselves that emergency malloc is available, but then failed
tests.
We don't expose DefaultArena anymore. Simply passing nullptr implies
default arena.
We also streamline default arena initialization (we
previously relied on combination of lazy initialazation in ArenaInit
and constexpr construction).
There is also no need to have elaborate ArenaLock thingy. We use plain
SpinLockHolder instead.
This is off by default for now. And autotools-only. But after
significant preparatory work, we're now able to do it cleanly. Stuff
that was previously exported just for tests (like page heap stuff or
flags) is now unexported.
Just like on windows all symbols explicitly exported by
PERFTOOLS_DLL_DECL are visible and exported and rest are hidden. Those
include all the malloc/new APIs of course, and all the other symbols
we advertise in our headers (e.g. MallocExtension, MallocHook).
Updates issue #600