While it still needs more work to be up to highest standards of
cleanness and clarity, we made it better.
One big part of this change is we're now liberated from
tcmalloc_unittest.sh. We now arrange testing of various environment
variable tweaks in C++ by self-execing with updated environment at the
end of tests.
Biggest part of it is removal of SetTestResourceLimit. The reasoning
behind it is, we don't do it on windows. Tests pass just fine. So,
there is no reason to bother.
It is still somewhat broken and somewhat of a mess, but it is much
lesser mess now.
As part of this change I also amputated a number of unnecessary or too
complicated bits. Like attempt to build both static and shared
versions of tcmalloc libraries at the same time. We stick to cmake's
"common" (seemingly) behavior of defaulting to something (in our case
shared library) and letting users override BUILD_SHARED_LIBS.
All the "install" bits are amputated as well, they're not ready.
We now wrap StackTraceScope thingy in tcmalloc-specific parts, instead
of automagically inside every stacktrace.cc function. TCMalloc bits
all need to grab stacktraces via newly introduced
tcmalloc::GrabBacktrace (which handles emergency malloc wrapping).
New approach eliminates the need for doing fake stacktrace scope. CPU
profiler, being distinct .so library couldn't take advantage of
emergency malloc anyways.
This simplifies the build further and eliminates another potential
point of runtime divergence when stacktrace is linked to both
libprofiler and libtcmalloc.
Logic was removed from thread_cache.{h,cc} into
thread_cache_ptr.{h,cc}.
Separation will help possible future evolution, and we already changed
the logic quite a bit:
* early access (when TLS isn't initialized yet) now uses global
ThreadCache instance. We therefore have ThreadCachePtr instances
managing required locking. This eliminates unnecessary complication of
PTHREADS_CRASHES_IF_RUN_TOO_EARLY logic, and any other danger of
touching TLS too early. BTW previous implementation actually leaked
initial early-initialized ThreadCache instance(!)
* old configure-time HAVE_TLS logic is amputated. Config-time part of
it made little sense as C++ 17 guarantees availability of
thread_local, but we have manually curated deny-list of "bad" OSes,
that we tested (via compile checks!) at configure time. Now this
is all compile time. There is now compile-time kHaveGoodTLS variable
and we're using it mostly via if constexpr.
* kHaveGoodTLS case of creating thread cache is simplified and made
more straightforward (no need to have in_setspecific logic).
* !kHaveGoodTLS case if fixed and improved too. We avoid
std:🧵:get_id, as it deadlocks on mingw. We use errno address as
a portable and (usually) async-signal safe 'my thread' identifier. We
also eliminate linear searching of thread's cache and replace it with
straightforward hash table lookup.
This allows us, later, to avoid building this stuff in configurations
that don't use it. I have also reduced API and ABI surface to enable
further refactorings.
* Remove build dependency on HAVE_PTHREAD
* Remove build dependency on HAVE_STD_ALIGNED_VAL_T and ENABLE_ALIGNED_NEW_DELETE
* Remove redundant tcmalloc.h files & ensure there are no cross-build-tool references
* Adopt automake commit 26927d1 in the CMake build
- Fix CMake builds for MinGW and MSVC
- Ensure the Autotools, CMake and VSProj builds do not reference each others' config.h
- Use std:🧵:id instead of our own thread ID wrappers
- Moved explicit TLS wrapper functions into the tcmalloc:: namespace and change their visibility to hidden
Resolves#1486
Pretty much everything has fork, except windows. Which is much simpler
to test at compile time, then duplicating and complicating the test at
configure time.