The test includes override of
MallocHook_InitAtFirstAllocation_HeapLeakChecker which runs early
enough to trigger FreeBSD bug of not having nearly anything working
early.
Turns out at least on FreeBSD tmpfile will fail if TMPDIR points to
non-existant directory. This also has nice property of leaving it down
to users to set up TMPDIR the way they want.
There were far too many intermediate libraries, references to obsolete
libtool bugs and whatnot.
We now have libcommon.la as a convenience archive that contains
spinlock, logging and misc stuff. A number of unit tests that need
those facilities are being linked to this.
Then we have libstacktrace.la convenience archive, since it is used by
libtcmalloc.la and libprofiler.la.
And after that we're simply directly creating our main library
products: libtcmalloc_minimal{,_debug}.la, libtcmalloc_{,debug}.la,
libprofiler.la etc.
We now wrap StackTraceScope thingy in tcmalloc-specific parts, instead
of automagically inside every stacktrace.cc function. TCMalloc bits
all need to grab stacktraces via newly introduced
tcmalloc::GrabBacktrace (which handles emergency malloc wrapping).
New approach eliminates the need for doing fake stacktrace scope. CPU
profiler, being distinct .so library couldn't take advantage of
emergency malloc anyways.
This simplifies the build further and eliminates another potential
point of runtime divergence when stacktrace is linked to both
libprofiler and libtcmalloc.
For compiling things automake never needs to be given a full set of
headers. Usually headers are specified so that make dist includes them
into archive, but we can achieve this goal easier.
This reduces size and complexity of our Makefile.am stuff.
We had hardcoded size alignment on 64 bytes, seemingly to avoid
cacheline contention when taking per-size-class central free list
locks. But it makes more sense to do platform specific alignment.
I did consider std::hardware_constructive_interference_size, but
practical values seem to differ from what we have configured at least
on some platforms.
gcc needs both inline and __attribute__((always_inline)), while MSVC
only needs __forceinline. So relevant correct combination is now just
ALWAYS_INLINE macro.
It is slightly less great that we have to disable thread-safety
analysis, but they explicitly say that optional locking is not
supported. And ours is very explicitly optional locking. I.e. we only
let ourselves resort to slow locking very early during process
startup, when TLS isn't 100% ready and safe to use yet.
Previously when dealing with OOM condition we couldn't use
std::get_new_handler (it didn't exist until c++ 11). Now we can and it
simplifies logic a bit.
We also detect -fno-exceptions case via standard __cpp_exceptions
define. Also I've rearranged the logic and even found and fixed one
bug in -fno-exceptions case.
Logic was removed from thread_cache.{h,cc} into
thread_cache_ptr.{h,cc}.
Separation will help possible future evolution, and we already changed
the logic quite a bit:
* early access (when TLS isn't initialized yet) now uses global
ThreadCache instance. We therefore have ThreadCachePtr instances
managing required locking. This eliminates unnecessary complication of
PTHREADS_CRASHES_IF_RUN_TOO_EARLY logic, and any other danger of
touching TLS too early. BTW previous implementation actually leaked
initial early-initialized ThreadCache instance(!)
* old configure-time HAVE_TLS logic is amputated. Config-time part of
it made little sense as C++ 17 guarantees availability of
thread_local, but we have manually curated deny-list of "bad" OSes,
that we tested (via compile checks!) at configure time. Now this
is all compile time. There is now compile-time kHaveGoodTLS variable
and we're using it mostly via if constexpr.
* kHaveGoodTLS case of creating thread cache is simplified and made
more straightforward (no need to have in_setspecific logic).
* !kHaveGoodTLS case if fixed and improved too. We avoid
std:🧵:get_id, as it deadlocks on mingw. We use errno address as
a portable and (usually) async-signal safe 'my thread' identifier. We
also eliminate linear searching of thread's cache and replace it with
straightforward hash table lookup.
This allows us, later, to avoid building this stuff in configurations
that don't use it. I have also reduced API and ABI surface to enable
further refactorings.
It isn't needed on modern opensolaris. And in fact it breaks the
build. And I am not entirely sure how to accomodate both old and new
behavior. So lets keep things simple and assume that old
behavior (where madvise needs to be declared) is ancient enough.
Automake or autoconf adds them automagically, but we don't really need
them. Main effect of this change is that MSVC version of config.h
doesn't duplicate package version.