This unbreaks some cases where patching complains about too short
functions to patch.
What happens is we first locate one of CRT-s (like ucrt or msvcrt) and
patch __expand there, redirecting to our implementation. Then "static"
__expand replacement is patched, but it is usually imported from that
same C runtime DLL. And through several jmp redirections we end up at
our own __expand from libc<1>. Patching that (and other cases) is
wrong, but I am unsure how to fix it properly. So we do most simple
workaround. I found that when it not fails is either in debug builds
where empty expand is not too short or when MSVC deduplicates multiple
identical __expand implementations into single function, or when
64-bit patching has to do extra trampoline thingy. And then our
patching code checks if we're trying to replace some function with
itself. So we "just" take advantage of that and get immediate issue
fixed, while punting on more general "duplicate" patching for later.
Update github issue #667
There was this piece of makefile with indention to add stack tracing
functionality (for stuff like growthz, GetCallerStackTrace and
probably heap sampling) to work even in minimal configuration on
mingw.
What is odd is we fail to actually define libstacktrace.la target on
mingw, since libstacktrace.la requires WITH_STACK_TRACE automake
conditional which we don't enable on this platform. And yet somehow it
doesn't fail. It produces empty libstacktrace.la, so build kinda
works. Except at least on my machine it produces racy makefiles. So
lets not pretend and stop breaking our parallel builds.
This is nearly impossible in practice, but still. Somehow we missed
this logic that DoSampledAllocation always returns actual object, but
in that condition where stacktrace_allocator failed to get us
StackTrace object we ended up returning span instead of it's object.
Actual growthz list is now lockless since we never delete anything
from it. And we now pass special 'locking context' object down page
heap allocation path, both as a documentation that it is under lock
and for tracking whether we needed to grow heap and by how much. Then
whenever lock is released in RAII fashion, we're able to trigger
growthz recording outside of lock.
Closes#1159
While there is still plenty of code that takes pageheap_lock outside
of page_heap module for all kinds of reasons, at least
bread-and-butter logic of allocating/deallocating larger chunks of
memory is now handling page heap locking inside PageHeap itself. This
gives us flexibility.
Update issue #1159
I.e. this covers case of arms that by default compile tcmalloc for 8k
logical pages (assuming 4k system pages), but can actually run on
systems with 64k pages.
Closes#1135
Previous implementation wasn't entirely safe w.r.t. 32-bit off_t
systems. Specifically around mmap replacement hook. Also, API was a
lot more general and broad than we actually need.
Sadly, old mmap hooks API was shipped with our public headers. But
thankfully it appears to be unused externally (checked via github
search). So we keep this old API and ABI for the sake of formal API
and ABI compatibility. But this old API is now empty and always
fails (some OS/hardware combinations didn't have functional
implementations of those hooks anyways).
New API is 64-bit clean and only provides us with what we need. Namely
being able to react to virtual address space mapping changes for
logging, heap profiling and heap leak checker. I.e. no pre hooks or
mmap-replacement hooks. We also explicitly not ship this API
externally to give us freedom to change it.
New code is also hopefully tidier and slightly more portable. At least
there are fewer arch-specific ifdef-s.
Another somewhat notable change is, since mmap hook isn't needed in
"minimal" configuration, we now don't override system's
mmap/munmap/etc functions in this configuration. No big deal, but it
reduces risk of damage if we somehow mess those up. I.e. musl's mmap
does few things that our mmap replacement doesn't, such as very fancy
vm_lock thingy. Which doesn't look critical, but is good thing for us
not to interfere with when not necessary.
Fixes issue #1406 and issue #1407. Lets also mention issue #1010 which
is somewhat relevant.
It was originally separated into own function due to some (now
obsolete) compiler optimization bug. We stopped worked around that bug
few commits ago. But as github user plopresti rightfully pointed
out (much, much thanks!), we lost `static' on that function. But since
it is trivial function and used exactly once, it is best to simply
inline it's 2 lines of code back instead of returning static.
Previous commit was: 68b442714a
This facility allowed us to build tcmalloc without linking in actual
-lpthread. Via weak symbols we checked at runtime if pthread functions
are available and if not, special single-threaded stubs were used
instead. Not always brining in pthread dependency helped performance
of some programs or libraries which depended at runtime on whether
threads are linked or not. Most notable of those are libstdc++ which
uses non-atomic refcounting on single threaded programs.
But such optional dependency on pthreads caused complications for
nearly no benefit. One trouble was reported in github issue #1110.
This days glibc/libstdc++ combo actually depends on
sys/single_threaded.h facility. So bringing pthread at runtime is
fine. Also modern glibc ships pthread symbols inside libc anyways and
libpthread is empty. I also found that for whatever reason on BSDs and
osx we already pulled in proper pthreads too.
So we loose nothing and we get issue #1110 fixed. And we simplify
everything.
Our perftools_pthread_once implementation has some ifdefs and what
not. I.e. some OSes have complications around early usage and windows
is windows. But in fact we can implement trivial spinlock-supported
implementation and depend on nothing.
Update issue #1110
Original CL: https://chromiumcodereview.appspot.com/10391178
1. Enable large object pointer offset check in release build.
Following code will now cause a check error:
char* p = reinterpret_cast<char*>(malloc(kMaxSize + 1));
free(p + 1);
2. Remove a duplicated error reporting function "DieFromBadFreePointer",
can use "InvalidGetAllocatedSize".
Reviewed-on: https://chromium-review.googlesource.com/1184335
[alkondratenko@gmail.com] removed some unrelated formatting changes
Signed-off-by: Aliaksey Kandratsenka <alkondratenko@gmail.com>
It barks because we access the field in a special way through address
computation (see magic2_addr), so it is indeed not accessed
directly. To emphasize that it comes directly after size2_ we simply
converted size2 and magic2_ into explicit array size_and_magic2_.
Our tests do explicitly trigger certain "bad" cases. And compilers are
getting increasingly smarter, so they start giving us warnings. People
dislike warnings, so lets try to disable them specifically for unit
tests.
Refers to issue #1401
In heap checker we actually use test_str address after deletion, to
verify at runtime that heap checker is indeed tracking deletions. But
gcc barks. So lets hide allocation from it by calling
tc_newarray/tc_deletearray.
Refers to #1401
g++ 13 somehow considers reinterpret_cast<uintptr_t>(InitialNewHook)
as non-constexpr. It then generates runtime constructor for malloc
hooks, which run too late. That broke heap checker and heap profiler.
We fix by making array of hooks inside malloc to have proper function
pointer type, which lets us avoid reinterpret_cast which makes
construction compile-time again.
If we don't avoid there is possibility of deadlock. Like this:
*) thread A runs malloc and happens to have some lock(s) there. It
gets profiling signal.
*) thread B runs ProfileHandler::UnregisterCallback or similar and has
both control_lock_ and signal_lock_.
*) thread B calls memory allocator (it used to delete token under
lock) and hits the malloc lock, so waits
*) thread A that has malloc lock wants to get signal_lock_ and waits
too.
Correct behavior is since we grab signal_lock_ from pretty much
anywhere as part of signal handler, it then implies transitively that
anything that grabs that lock cannot call into anything that grabs any
other lock (e.g. including malloc). I.e. signal_lock_ "nests" under
every other lock in the program.
So we refector the code some. When removing callbacks we simply copy
entire list with right token removed. Then grab signal_lock_ and only
swap copy with old callback list. Then signal_lock_ lock is released
and all the memory freeing operations are performed.
When adding callback we utilize splice to also avoid anything "heavy"
under signal_lock_.
And as part of that change we're now able to have UpdateTimer method
only needing control_lock_ instead of signal_lock_.
We add somewhat elaborate unit test that was able to catch the issue
without fix and now works fine.
Closes github issue #412
We had this logic to ensure that TesterThread succeeds at least 50% of
times grabbing passed objects lock, apparently to ensure that multiple
threads actually participate. But this check occasionally flakes. It
is totally possible that lock owner gets preempted while holding lock
and other threads accumulate gazillion of failed trylocks. So while it
is nice to test this somehow, flakes are bad enough, so we drop this
test.
- Some small automake changes. Add libc++ for AIX instead of libstdc++
- Add the interface changes for AIX:User-defined malloc replacement
- Add code to avoid use of pthreads library prior to its initialization
- Some small changes to the unittest case
- Update INSTALL for AIX
[alkondratenko@gmail.com]: lower-case/de-trailing-dot for commit subject line
[alkondratenko@gmail.com]: rebase
[alkondratenko@gmail.com]: amputate unused AM_CONDITIONAL for AIX
[alkondratenko@gmail.com]: explicitly mention libc_override_aix.h in Makefile.am
We used to explicitly link to libstdc++, libm and even libpthread, but
this should be handled by libtool since those are dependencies of
libtcmalloc_minimal. What also helps is we now build everything with
C++ compiler, not C. So libstdc++ or (libc++) dependency doesn't need
to be added at all, even if libtool for some reason fails to handle
it.
This fixes github issue #1323.
When populating span with linked list of objects ptr + size <= limit
condition could overflow before comparing to limit. So fixed code
carefully tests limit. We also produce version with
__builtin_add_overflow since this is semi-hot loop and we want it to
be fast.
This fixes issue #1371
From time to time things file inside tcmalloc guts where calling to
malloc is not safe. Regular strerror does locale bits, so will
occasionally open files/malloc/etc. We avoid this by using our own
"safe" variant that hardcodes names of all POSIX errno constants.