gperftools

mirror of https://github.com/gperftools/gperftools synced 2025-02-18 21:26:49 +00:00

Author	SHA1	Message	Date
Aliaksey Kandratsenka	eeb7b84c20	Register tcmalloc atfork handler as early as possible This is what other mallocs do (glibc malloc and jemalloc). The idea is malloc is usually initialized very eary. So if we register atfork handler at that time, we're likely to be first. And that makes our atfork handler a bit safer, since there is much less chance of some other library installing their "take all locks" handler first and having fork take malloc lock before library's lock and deadlocking. This should address issue #904.	2017-07-08 16:08:29 -07:00
Aliaksey Kandratsenka	208c26caef	Add initial syscall support for mips64 32-bit ABI This applies patch by Adhemerval Zanella from https://github.com/gperftools/gperftools/issues/845. Only malloc (i.e. tcmalloc_minimal) was tested to work so far.	2017-07-08 13:34:41 -07:00
Francis Ricci	a3bf61ca81	Ensure that lsan flags are appended on all necessary targets	2017-07-08 13:33:30 -07:00
Aliaksey Kandratsenka	97646a1932	Add missing NEWS entry for recent 2.6 release Somehow I managed to miss this last commit in 2.6 release. So lets add it now even if it is too late.	2017-07-04 21:02:34 -07:00
Aliaksey Kandratsenka	4be05e43a1	bumped version up to 2.6	2017-07-04 20:35:25 -07:00
Francis Ricci	70a35422b5	Ignore current_instance heap allocation when leak sanitizer is enabled Without this patch, any user program that enables LeakSanitizer will see a leak from tcmalloc. Add a weak hook to __lsan_ignore_object, so that if LeakSanitizer is enabled, the allocation can be ignored.	2017-07-04 20:24:47 -07:00
Aliaksey Kandratsenka	6eca6c64fa	Revert "issue-654: [pprof] handle split text segments" This reverts commit `8c3dc52fcf`. People have reported issues with this so lets stay safe and use older even if less powerful code.	2017-07-01 18:48:58 -07:00
KernelMaker	a495969cb6	update the prev_class_size in each loop, or the min_object_size of tcmalloc.thread will always be 1 when calling GetFreeListSizes	2017-05-29 15:05:55 -07:00
Kim Gräsman	163224d8af	Document HEAPPROFILESIGNAL environment variable	2017-05-29 15:04:00 -07:00
Aliaksey Kandratsenka	5ac82ec5b9	added stacktrace capturing benchmark	2017-05-29 14:57:13 -07:00
Aliaksey Kandratsenka	c571ae2fc9	2.6rc4	2017-05-22 19:04:20 -07:00
Aliaksey Kandratsenka	f2bae51e7e	Revert "Revert "disable dynamic sized delete support by default"" This reverts commit `b82d89cb7c`. Dynamic sized delete support relies on ifunc handler being able to look up environment variable. The issue is, when stuff is linked with -z now linker flags, all relocations are performed early. And sadly ifunc relocations are not treated specially. So when ifunc handler runs, it cannot rely on any dynamic relocations at all, otherwise crash is real possibility. So we cannot afford doing it until (and if) ifunc is fixed. This was brought to my attention by Fedora people at https://bugzilla.redhat.com/show_bug.cgi?id=1452813	2017-05-22 18:58:15 -07:00
Aliaksey Kandratsenka	6426c0cc80	2.6rc3	2017-05-22 03:08:30 -07:00
Aliaksey Kandratsenka	0c0e2fe43b	enable 48-bit page map on msvc as well	2017-05-22 03:08:30 -07:00
Aliaksey Kandratsenka	83d6818295	speed up 3-level page map access There is no need to have pointer indirection for root node. This also helps the case of early free of garbage pointer because we didn't check root_ pointer for NULL.	2017-05-22 03:08:15 -07:00
Aliaksey Kandratsenka	f7ff175b92	add configure-time warning on unsupported backtrace capturing Both libgcc and libc's backtrace() are not really options for stack trace capturing from inside profiling signal handler. So lets warn people.	2017-05-22 01:55:50 -07:00
Aliaksey Kandratsenka	cef582350c	align fast-path functions only if compiler supports that Apparently gcc only supports __attribute__((aligned(N))) on functions only since version 4.3. So lets test it in configure script and only use when possible. We now use CACHELINE_ALIGNED_FN macro for aligning functions.	2017-05-22 01:55:50 -07:00
Aliaksey Kandratsenka	bddf862b18	actually support very early freeing of NULL This was caught by unit tests on centos 5. Apparently some early thingy is trying to do vprintf which calls free(0). Which used to crash since before size class cache is initialized it'll report hit (with size class 0) for NULL pointer, so we'd miss the case of checking NULL pointer free and crash. The fix is to check for IsInited in the case when thread cache is null, and if so then we escalte to free_null_or_invalid.	2017-05-22 01:54:56 -07:00
Aliaksey Kandratsenka	07a124d8c1	don't use arg-ful constructor attribute for early nallocx test 101 is not very early anyways and arg-ful constructor attribute is only supported since gcc 4.3 (and e.g. rhel 5's compiler fails to compile it). So there seems to be very little value trying to ask for priority of 101.	2017-05-21 22:49:54 -07:00
Aliaksey Kandratsenka	5346b8a4de	don't depend on SIZE_MAX definition in sampler.cc It was reported that SIZE_MAX isn't getting defined in C++ mode when C++ standard is less than c++11. Because we still want to support non-c++11 systems (for now), lets make it simple and not depend on SIZE_MAX (original google-internal code used std::numeric_limits<ssize_t>::max, but that failed to compile on msvc). Fixes issue #887 and issue #889.	2017-05-21 22:49:20 -07:00
Aliaksey Kandratsenka	50125d8f70	2.6rc2	2017-05-15 00:02:43 -07:00
Aliaksey Kandratsenka	a5e8e42a47	don't link-in libunwind if libunwind.h is missing I got report that some build environments for https://github.com/lyft/envoy are having link-time issue due to linking libunwind. It was happening despite libunwind.h being present, which is clear bug as without header we won't really use libunwind.	2017-05-14 23:45:08 -07:00
Rajalakshmi Srinivasaraghavan	e92acdf98d	Fix compilation error for powerpc32 Fix the following compilation error for powerpc32 platform when using latest glibc. error: ‘siginfo_t’ was not declared in this scope	2017-05-14 23:08:13 -07:00
Aliaksey Kandratsenka	b48403a4b0	2.6rc	2017-05-14 22:00:28 -07:00
Aliaksey Kandratsenka	53f15325d9	fix compilation of tcmalloc_unittest.cc on older llvm-gcc	2017-05-14 20:35:22 -07:00
Aliaksey Kandratsenka	b1d88662cb	change size class to be represented by 32 bit int This moves code closer to Google-internal version and provides for slightly tighter code encoding on amd64.	2017-05-14 19:04:56 -07:00
Aliaksey Kandratsenka	991f47a159	change default transfer batch back to 32 Some tensorflow benchmarks are seeing large regression with elevated values. So lets stick to old safe default until we understand how to make larger values work for all workloads.	2017-05-14 19:04:56 -07:00
Aliaksey Kandratsenka	7bc34ad1f6	support different number of size classes at runtime With TCMALLOC_TRANSFER_NUM_OBJ environment variable we can change transfer batch size. And with that comes slightly different number of size classes depending on value of transfer batch size. We used to have hardcoded number of size classes, so we couldn't really support any batch size setting. This commit adds support for dynamic number of size classes (runtime value returned by Static::num_size_classes()).	2017-05-14 19:04:56 -07:00
Aliaksey Kandratsenka	4585b78c8d	massage allocation and deallocation fast-path for performance This is significant speedup of fast-path of malloc. Large part comes from avoiding expensive function prologue/epilogue. Which is achieved by making sure that tc_{malloc,new,free} etc are small functions that do only tail-calls. We keep only critical path in those functions and tail-call to slower "full" versions when we need to deal with less common case. This helps compiler generate much tidier code. Fast-path readyness check is now different too. We used to have "min size for slow path" variable, which was set to non-zero value when we know that thread cache is present and ready. We now have use thread-cache pointer not equal to NULL as readyness check. There is special ThreadCache::threadlocal_data_.fast_path_heap copy of that pointer that can be temporarily nulled to disable malloc fast path. This is used to enable emergency malloc. There is also slight change to tracking thread cache size. Instead of tracking total size of free list, it now tracks size headroom. This allows for slightly faster deallocation fast-path check where we're checking headroom to stay above zero. This check is a bit faster than comparing with max_size_.	2017-05-14 19:04:56 -07:00
Aliaksey Kandratsenka	5964a1d9c9	always inline a number of hot functions	2017-05-14 19:04:55 -07:00
Aliaksey Kandratsenka	e419b7b9a6	introduce ATTRIBUTE_ALWAYS_INLINE	2017-05-14 19:04:55 -07:00
Aliaksey Kandratsenka	7d588da7ec	synchronized Sampler implementation with Google-internal version This is mostly dropping FastLog2 which was never necessary for performance, and making sampler to be called always, even if sampling is disabled (this benefits more for always-sampling case of Google fork). We're also getting TryRecordAllocationFast which is not used yet, but will be as part of subsequent fast-path speedup commit.	2017-05-14 19:04:55 -07:00
Aliaksey Kandratsenka	27da4ade70	reduce size of class_to_size_ array Since 32-bit int is enough and accessing smaller array will use a bit less of cache.	2017-05-14 19:04:55 -07:00
Aliaksey Kandratsenka	335f09d4e4	use static location for pageheap Makes it a bit faster to access, since we're dropping pointer indirection.	2017-05-14 19:04:55 -07:00
Aliaksey Kandratsenka	6ff332fb51	move size classes map earlier in SizeMap Since we access them more often, having at least one of them at offset 0 makes pi{c,e} code a bit smaller.	2017-05-14 19:04:55 -07:00
Aliaksey Kandratsenka	121b1cb32e	slightly faster size class cache Lower bits of page index are still used as index into hash table. Those lower bits are zeroed, or-ed with size class and placed into hash table. So checking is just loading value from hash table, xoring with higher bits of address and checking if resultant value is lower than 128. Notably, size class 0 is not considered "invalid" anymore.	2017-05-14 19:04:55 -07:00
Aliaksey Kandratsenka	b57c0bad41	init tcmalloc prior to replacing system alloc Currently on windows, we're depending on uninitialized tcmalloc variables to detect freeing foreign malloc's chunks. This works somewhat by chance due to 0-initialized size classes cache working as cache with no values. But this is about to change, so lets do explicit initialization.	2017-05-14 19:04:55 -07:00
Aliaksey Kandratsenka	71fa9f8730	use 2-level page map for 48-bit addresses 48 bits is size of x86-64 and arm64 address spaces. So using 2 levels map for them is slightly faster. We keep 3 levels for small-but-slow configuration, since 2 levels consume a bit more memory. This is partial port of Google-internal commit by Sanjay Ghemawat (same idea, different implementation).	2017-05-14 19:04:55 -07:00
Aliaksey Kandratsenka	bad70249dd	use 48-bit addresses on 64-bit arms too	2017-05-14 19:04:55 -07:00
Aliaksey Kandratsenka	5f12147c6d	use hidden visibility for some key global variables So that our -fPIC code is faster	2017-05-14 19:04:55 -07:00
Aliaksey Kandratsenka	dfd53da578	set ENOMEM in handle_oom	2017-05-14 19:04:55 -07:00
Aliaksey Kandratsenka	14fd551072	avoid O(N²) in thread cache creation code	2017-05-14 19:04:55 -07:00
Aliaksey Kandratsenka	507a105e84	pass original size to DoSampledAllocation It makes heap profiles more accurate. Google's internal malloc is doing it as well.	2017-05-14 19:04:55 -07:00
Aliaksey Kandratsenka	bb77979dea	don't declare throw() on malloc funtions since it is faster Apparently throw() on functions actually asks compiler to generate code to detect unexpected exceptions. Which prevents tail calls optimization. So in order to re-enable this optimization, we simply don't tell compiler about throw() at all. C++11 noexcept would be even better, but it is not universally available yet. So we change to no exception specifications. Which at least for gcc & clang on Linux (and likely for all ELF platforms, if not just all) really eliminates all overhead of exceptions.	2017-05-14 19:04:55 -07:00
Aliaksey Kandratsenka	89c74cb79c	handle duplicate google_malloc frames in malloc hook stack trace Subsequent optimization may cause multiple malloc functions in google_malloc section to be in call stack. Particularly when fast-path malloc function calls slow-path and compiler chooses to implement such call as regular call instead of tail-call. Because we need stacktrace just until first such function, once we find innermost such frame, we're simply checking if next outer frame is also google_malloc and consider it instead.	2017-05-14 19:04:55 -07:00
Aliaksey Kandratsenka	0feb1109ac	fix stack trace capturing in debug malloc Particularly, hardcoded skip count was relying on certain behavior of compiler. Namely, that tail calls inside DebugDeallocate path are not actually implemented as tail calls. New implementation is using google_malloc section as a marker of malloc boundary. But in order for this to work, we have to prevent tail-call in debugallocation's tc_XXX functions. Which is achieved by doing volatile read of static variable at the end of such functions.	2017-05-14 19:04:55 -07:00
Aliaksey Kandratsenka	0506e965ee	replace LIKELY/UNLIKELY with PREDICT_{TRUE,FALSE} Google-internal code is using PREDICT_TRUE/FALSE, so we should be doing it too.	2017-05-14 19:04:55 -07:00
Aliaksey Kandratsenka	59a4987054	prevent inlining ATTRIBUTE_SECTION functions So that their code is always executing in prescribed section.	2017-05-14 19:04:55 -07:00
Aliaksey Kandratsenka	ebb575b8a0	Revert "enabled aggressive decommit by default" This reverts commit `7da5bd014d`. Some tensorflow benchmarks are getting slower with aggressive decommit.	2017-05-14 19:04:55 -07:00
Aliaksey Kandratsenka	b82d89cb7c	Revert "disable dynamic sized delete support by default" This reverts commit `06811b3ae4`.	2017-05-14 19:04:55 -07:00

1 2 3 4 5 ...

601 Commits