We now support a set of command line flags similar to "abseil"
benchmark thingy. I.e. to let people specify a subset of benchmarks or
run them longer/shorter as needed.
This commit also includes small, portable and very simple regexp
facility. It isn't good enough for some production use, but it is
plenty good for some testing uses or benchmark selection.
The case of mmap and sbrk hooks is simple enough that we can do
simpler "skip right number of frames" approach. Instead of relying on
less portable and brittle attribute section trick.
The comments in this file stated that it has to be linked in specific
order to get initialized early enough. But our initialization is
de-facto via initial malloc hook. And to deal with latest-possible
destruction, we use more convenient destructor function
attribute, and make things simpler.
References github issue #1503.
This significantly reworks both early thread cache access and related
emergency malloc mode checking integration. As a result, we're able to
to rely on emergency malloc even on systems without "good"
TLS (e.g. QNX which does emutls).
One big change here is we're undoing early change to have single
"global" thread cache early during process lifetime. It was nice and
somewhat simpler approach. But because of inherent locking during
early process lifetime, we couldn't be sure of certain lock ordering
aspects w.r.t. backtracing/exception-stack-unwinding. So I choose to
keep it safe. So the new idea is we use SlowTLS facility to find
threads' caches when normal tls isn't ready yet. It avoids holding
locks around potentially recursion-ful things (like
ThreadCache::ModuleInit or growing heap). But we then have to be
careful to "re-attach" those early thread cache instances to regular
TLS. There will nearly always be just one of such thread caches. For
initial thread. But we cannot entirely rule out more general case
where someone creates threads before process initializers ran and
main() is reached. Another notable thing is free-ing memory in this
early mode will always using slow-path deletion directly into central
free list.
SlowTLS facility is "simply" a generalization of previous
CreateBadTLSCache code. I.e. we have a small fixed-size cache that
holds "exceptional" instances of thread-identitity to
thread-cache+emergency-mode-flag mappings.
We also take advantage of tcmalloc::kInvalidTLSKey we introduced
earlier and remove potentially raceful memory ordering between reading
tls_ready_ and tls_key_.
For emergency malloc detection we previously used thread_local
flag. Which we cannot use on !kHaveGoodTLS systems. So we instead
_remove_ thread's cache from it's normal tls storage and place it
"into" SlowTLS instead for the duration of WithStacktraceScope
call (which is how emergency mode is enabled now).
While it still needs more work to be up to highest standards of
cleanness and clarity, we made it better.
One big part of this change is we're now liberated from
tcmalloc_unittest.sh. We now arrange testing of various environment
variable tweaks in C++ by self-execing with updated environment at the
end of tests.
In 57d6ecc4ae I removed obsolete
src/windows/TODO but failed to remove it from Makefile.am
I also previously failed to arrange distribution of vendored
googletest. And we also failed to distribute headers in src/tests/
This is all fixed now.
Turns out at least on FreeBSD tmpfile will fail if TMPDIR points to
non-existant directory. This also has nice property of leaving it down
to users to set up TMPDIR the way they want.
There were far too many intermediate libraries, references to obsolete
libtool bugs and whatnot.
We now have libcommon.la as a convenience archive that contains
spinlock, logging and misc stuff. A number of unit tests that need
those facilities are being linked to this.
Then we have libstacktrace.la convenience archive, since it is used by
libtcmalloc.la and libprofiler.la.
And after that we're simply directly creating our main library
products: libtcmalloc_minimal{,_debug}.la, libtcmalloc_{,debug}.la,
libprofiler.la etc.
We now wrap StackTraceScope thingy in tcmalloc-specific parts, instead
of automagically inside every stacktrace.cc function. TCMalloc bits
all need to grab stacktraces via newly introduced
tcmalloc::GrabBacktrace (which handles emergency malloc wrapping).
New approach eliminates the need for doing fake stacktrace scope. CPU
profiler, being distinct .so library couldn't take advantage of
emergency malloc anyways.
This simplifies the build further and eliminates another potential
point of runtime divergence when stacktrace is linked to both
libprofiler and libtcmalloc.