It looks like, in past it could produce better code. But since
unwinding is totally different since almost forever now, there is no
perfomance benefit of it anymore.
I.e. because otherwise, when --enable-minimal is given, we're building
empty libtcmalloc.la and linking it to malloc_bench_shared_full. Which
has no effect at all and actually breaks builds on OSX.
Should fix issue #869.
nallocx is extension introduced by jemalloc. It returns effective size
of allocaiton without allocating anything.
We also support MALLOCX_LG_ALIGN flag. But all other jemalloc
flags (which at the moment do nothing for nallocx anyways) are
silently ignored, since there is no sensible way to return errors in
this API.
This was originally contributed by Dmitry Vyukov with input from
Andrew Hunter. But due to significant divergence of Google-internal
and free-software forks of tcmalloc, significant massaging was done by
me. So all bugs are mine.
Emergency malloc is enabled for cases when backtrace capturing needs to
call malloc. In this case, we enable emergency malloc just prior to
calling such code and disable it after it is done.
Particularly _Unwind_Backtrace which seems to be gcc extension.
This is what glibc's backtrace is commonly is using.
Using _Unwind_Backtrace directly is better than glibc's backtrace, since
it doesn't call into dlopen. While glibc does dlopen when it is built as
shared library apparently to avoid link-time dependency on libgcc_s.so
This also makes them output nicer results. I.e. every benchmark is run 3
times and iteration duration is printed for every run.
While this is still very synthetic and unrepresentave of malloc performance
as a whole, it is exercising more situations in tcmalloc fastpath. So it a
step forward.
Spinlock usage of cycle counter is due do tracking of time it's spent
waiting for lock. But this tracking is only useful we actually have
synchronization profiling working, which dont have. Thus I'm dropping
calls to this facility with eye towards further removal of cycle clock
usage.
So that LD_PRELOAD-ing doesn't force loading libpthread.so which may
slow down some single-threaded apps.
tcmalloc already has maybe_threads facility that can detect if
libpthread.so is loaded (via weak symbols) and provide 'simulations' of
some pthread functions that tcmalloc needs.
While this is not good representation of real-world production malloc
behavior, it is representative of length (instruction-wise and well as
cycle-wise) of fast-path. So this is better than nothing.
Default mode of operation of cpu profiler uses itimer and
SIGPROF. This timer is by definition per-process and no spec defines
which thread is going to receive SIGPROF. And it provides correct
profiles only if we assume that probability of picking threads will be
proportional to cpu time spent by threads.
It is easy to see, that recent Linux (at least on common SMP hardware)
doesn't satisfy that assumption. Quite big skews of SIGPROF ticks
between threads is visible. I.e. I could see as big as 70%/20%
division instead of 50%/50% for pair of cpu-hog threads. (And I do see
it become 50/50 with new mode)
Fortunately POSIX provides mechanism to track per-thread cpu time via
posix timers facility. And even more fortunately, Linux also provides
mechanism to deliver timer ticks to specific threads.
Interestingly, it looks like FreeBSD also has very similar facility
and seems to suffer from same skew. But due to difference in a way
how threads are identified, I haven't bothered to try to support this
mode on FreeBSD.
This commit implements new profiling mode where every thread creates
posix timer which tracks thread's cpu time. Threads also also set up
signal delivery to itself on overflows of that timer.
This new mode requires every thread to be registered in cpu
profiler. Existing ProfilerRegisterThread function is used for that.
Because registering threads requires application support (or suitable
LD_PRELOAD-able wrapper for thread creation API), new mode is off by
default. And it has to be manually activated by setting environment
variable CPUPROFILE_PER_THREAD_TIMERS.
New mode also requires librt symbols to be available. Which we do not
link to due to librt's dependency on libpthread. Which we avoid due
to perf impact of bringing in libpthread to otherwise single-threaded
programs. So it has to be either already loaded by profiling program
or LD_PRELOAD-ed.
Because clang doesn't understand -fno-builtin-malloc and friends. And
otherwise new/delete pairs get optimized away causing our tests that
expect hooks to be called to fail.
Somehow it's c++ headers (like string) define pthread symbols without
even us asking for. That breaks old assumption that pthread symbols
are not available on windows.
In order to fix that we detect this condition in configure.ac and
avoid defining windows versions of pthread symbols.
* some variables defined with "char *" should be modified to "const char*"
* For uclibc, glibc's "void malloc_stats(void)" should be "void malloc_stats(FILE *)", is commented now.
* For uclibc, __sbrk is with attribute "hidden", so we use mmap allocator for uclibc.
This merges patch contributed by Jovan Zelincevic.
And with that patch tcmalloc build with --enable-minimal (just malloc
replacement) appears to work (passes unit tests).
Because automake will not automatically add AM_LDFLAGS if there's
per-target LDFLAGS. See their good info manual.
This fixes .dll compilation of tcmalloc
git-svn-id: http://gperftools.googlecode.com/svn/trunk@205 6b5cf1ce-ec42-a296-1ba9-69fdba395a50
I.e. so that I can build tcmalloc.dll using comfortable environment of
my GNU/Linux box and without having to touch actual windows box or VM.
git-svn-id: http://gperftools.googlecode.com/svn/trunk@202 6b5cf1ce-ec42-a296-1ba9-69fdba395a50
This fix is a result of a performance degradation observed in multi-threaded programs where large
amounts of memory (30GB) are consumed, released by a pool of threads in a cyclic manner. This was
mainly due to the amount of time we were spending in the slow path consolidating memory between
the thread cache and central free list. The default has been bumped up to 32768 and is now also
controllable through the TCMALLOC_TRANSFER_NUM_OBJ environment setting.
git-svn-id: http://gperftools.googlecode.com/svn/trunk@193 6b5cf1ce-ec42-a296-1ba9-69fdba395a50
- Used aclocal, autoupdate, autoconf, and automake to correctly apply changes made to Makefile.am. Detailed instructions on this procedure can be found here http://inti.sourceforge.net/tutorial/libinti/autotoolsproject.html.
- Fixed a number of error/warning messages due to use of newer aclocal, autoconf, and automake utilities.
- Directly and indirectly related to issue-385 and issue-480.
git-svn-id: http://gperftools.googlecode.com/svn/trunk@173 6b5cf1ce-ec42-a296-1ba9-69fdba395a50
In revisions 151 and 150 an attempt was made to enable frame pointers by default for i386. However, in the process of doing so a number of files were inadvertently touched as a result of running autogen.sh. As a result, I have needed to roll back these revisions so that I can reattempt the change.
git-svn-id: http://gperftools.googlecode.com/svn/trunk@172 6b5cf1ce-ec42-a296-1ba9-69fdba395a50
* gperftools: version 2.0
* Renamed the project from google-perftools to gperftools (csilvers)
* Renamed the .deb/.rpm packagse from google-perftools to gperftools too
* Renamed include directory from google/ to gperftools/ (csilvers)
* Changed the 'official' perftools email in setup.py/etc
* Renamed google-perftools.sln to gperftools.sln
* PORTING: Removed bash-isms & grep -q in heap-checker-death_unittest.sh
* Changed copyright text to reflect Google's relinquished ownership
git-svn-id: http://gperftools.googlecode.com/svn/trunk@142 6b5cf1ce-ec42-a296-1ba9-69fdba395a50
* google-perftools: version 1.10 release
* PORTING: Support for patching assembly on win x86_64! (scott.fr...)
* PORTING: Work around atexit-execution-order bug on freebsd (csilvers)
* PORTING: Patch _calloc_crt for windows (roger orr)
* PORTING: Add C++11 compatibility method for stl allocator (jdennett)
* PORTING: use MADV_FREE, not MADV_DONTNEED, on freebsd (csilvers)
* PORTING: Don't use SYS_open when not supported on solaris (csilvers)
* PORTING: Do not assume uname() returns 0 on success (csilvers)
* LSS: Improved ARM support in linux-syscall-support (dougkwan)
* LSS: Get rid of unused syscalls in linux-syscall-support (csilvers)
* LSS: Fix broken mmap wrapping for ppc (markus)
* LSS: Emit .cfi_adjust_cfa_offset when appropriate (ppluzhnikov)
* LSS: Be more accurate in register use in __asm__ (markus)
* LSS: Fix __asm__ calls to compile under clang (chandlerc)
* LSS: Fix ARM inline assembly bug around r7 and swi (lcwu)
* No longer log when an allocator fails (csilvers)
* void* -> const void* for MallocExtension methods (llib)
* Improve HEAP_PROFILE_MMAP and fix bugs with it (dmikurube)
* Replace int-based abs with more correct fabs in a test (pmurin)
git-svn-id: http://gperftools.googlecode.com/svn/trunk@135 6b5cf1ce-ec42-a296-1ba9-69fdba395a50
gcc on i386, where it's not on by default (it is for
gcc/x86_64, in my tests). This could potentially cause an
error for embedded systems, which can have i386 but no mms,
but the code wouldn't run properly on them anyway without
tweaks.
git-svn-id: http://gperftools.googlecode.com/svn/trunk@127 6b5cf1ce-ec42-a296-1ba9-69fdba395a50
* Replace atexit() calls with global dtors; helps freebsd (csilvers)
* Fix malloc_hook_mmap_linux for ARM (dougkwan)
* Disalbe heap-checker under AddressSanitizer (kcc)
* Fix bug in powerpc stacktracing (ppluzhnikov)
* Use exponential backoff waiting for spinlocks (m3b)
* Fix 64-bit nm on 32-bit binaries in pprof (csilvers)
* Implement stacktrace for ARM (dougkwan)
* Add ProfileHandlerDisallowForever (rsc)
* Shell escape when forking in pprof (csilvers)
* Fix freebsd to work on x86_64 (chapp...@gmail.com)
* No longer combine overloaded functions in pprof (csilvers)
* Fix address-normalizing bug in pprof (csilvers)
* More consistently call abort() instead of exit() on failure (csilvers)
* Allow NoGlobalLeaks to be safely called more than once (csilvers)
* Beef up the documentation a bit about using libunwind (csilvers)
git-svn-id: http://gperftools.googlecode.com/svn/trunk@121 6b5cf1ce-ec42-a296-1ba9-69fdba395a50
* Make PageHeap dynamically allocated for leak checks (maxim)
* BUGFIX: Fix probing of nm -f behavior in pprof (dpeng)
* PORTING: Add "support" for MIPS cycletimer
* BUGFIX: Fix a race with the CentralFreeList lock (sanjay)
* Allow us to compile on OS X 10.6 and run on 10.5 (raltherr)
* Support /pprof/censusprofile url arguments (rajatjain)
* Die in configure when g++ is't installed (csilvers)
* Change IgnoreObject to return its argument (nlewycky)
* Update malloc-hook files to support more CPUs
* Move stack trace collecting out of the mutex (taylorc)
* BUGFIX: write our own strstr to avoid libc problems (csilvers)
* use simple callgrind compression facility in pprof
* print an error message when we can't run pprof to symbolize (csilvers)
git-svn-id: http://gperftools.googlecode.com/svn/trunk@120 6b5cf1ce-ec42-a296-1ba9-69fdba395a50
* google-perftools: version 1.8 release
* PORTING: (Disabled) support for patching mmap on freebsd (chapp...)
* PORTING: Support volatile __malloc_hook for glibc 2.14 (csilvers)
* PORTING: Use _asm rdtsc and __rdtsc to get cycleclock in windows (koda)
* PORTING: Fix fd vs. HANDLE compiler error on cygwin (csilvers)
* PORTING: Do not test memalign or double-linking on OS X (csilvers)
* PORTING: Actually enable TLS on windows (jontra)
* PORTING: Some work to compile under Native Client (krasin)
* PORTING: deal with pthread_once w/o -pthread on freebsd (csilvers)
* Rearrange libc-overriding to make it easier to port (csilvers)
* Display source locations in pprof disassembly (sanjay)
* BUGFIX: Actually initialize allocator name (mec)
* BUGFIX: Keep track of 'overhead' bytes in malloc reporting (csilvers)
* Allow ignoring one object twice in the leak checker (glider)
* BUGFIX: top10 in pprof should print 10 lines, not 11 (rsc)
* Refactor vdso source files (tipp)
* Some documentation cleanups
* Document MAX_TOTAL_THREAD_CACHE_SIZE <= 1Gb (nsethi)
* Add MallocExtension::GetOwnership(ptr) (csilvers)
* BUGFIX: We were leaving out a needed $(top_srcdir) in the Makefile
* PORTING: Support getting argv0 on OS X
* Add 'weblist' command to pprof: like 'list' but html (sanjay)
* Improve source listing in pprof (sanjay)
* Cap cache sizes to reduce fragmentation (ruemmler)
* Improve performance by capping or increasing sizes (ruemmler)
* Add M{,un}mapReplacmenet hooks into MallocHook (ribrdb)
* Refactored system allocator logic (gangren)
* Include cleanups (csilvers)
* Add TCMALLOC_SMALL_BUT_SLOW support (ruemmler)
* Clarify that tcmalloc stats are MiB (robinson)
* Remove support for non-tcmalloc debugallocation (blount)
* Add a new test: malloc_hook_test (csilvers)
* Change the configure script to be more crosstool-friendly (mcgrathr)
* PORTING: leading-underscore changes to support win64 (csilvers)
* Improve debugallocation tc_malloc_size (csilvers)
* Extend atomicops.h and cyceclock to use ARM V6+ optimized code (sanek)
* Change malloc-hook to use a list-like structure (llib)
* Add flag to use MAP_PRIVATE in memfs_malloc (gangren)
* Windows support for pprof: nul and /usr/bin/file (csilvers)
* TESTING: add test on strdup to tcmalloc_test (csilvers)
* Augment heap-checker to deal with no-inode maps (csilvers)
* Count .dll/.dylib as shared libs in heap-checker (csilvers)
* Disable sys_futex for arm; it's not always reliable (sanek)
* PORTING: change lots of windows/port.h macros to functions
* BUGFIX: Generate correct version# in tcmalloc.h on windows (csilvers)
* PORTING: Some casting to make solaris happier about types (csilvers)
* TESTING: Disable debugallocation_test in 'minimal' mode (csilvers)
* Rewrite debugallocation to be more modular (csilvers)
* Don't try to run the heap-checker under valgrind (ppluzhnikov)
* BUGFIX: Make focused stat %'s relative, not absolute (sanjay)
* BUGFIX: Don't use '//' comments in a C file (csilvers)
* Quiet new-gcc compiler warnings via -Wno-unused-result, etc (csilvers)
git-svn-id: http://gperftools.googlecode.com/svn/trunk@110 6b5cf1ce-ec42-a296-1ba9-69fdba395a50
* #include fixes (jyrki)
* Add missing stddef.h for ptrdiff_t (mec)
* Add M{,un}mapReplacement hooks into MallocHook (ribrdb)
* Force big alloc in frag test (ruemmler)
* PERF: Increase the size class cache to 64K entries (ruemmler)
* PERF: Increase the transfer cache by 16x (ruemmler)
* Use windows intrinsic to get the tsc (csilvers)
* Rename atomicops-internals-x86-msvc.h->windows.h (csilvers)
* Remove flaky DEATH test in malloc_hook_test (ppluzhnikov)
* Expose internal ReadStackTraces()/etc (lantran)
* Refactored system allocator logic (gangren)
* Include-what-you-use: cleanup tcmalloc #includes (csilvers)
* Don't set kAddressBits to 48 on 32-bit systems (csilvers)
* Add declaration for __rdtsc() for windows (koda)
* Don't revert to system alloc for expected errors (gangren)
* Add TCMALLOC_SMALL_BUT_SLOW support (ruemmler)
* Clarify that tcmalloc stats are MiB (robinson)
* Avoid setting cpuinfo_cycles_per_second to 0 (koda)
* Fix frag_unittest memory calculations (ruemmler)
* Remove support for non-tcmalloc debugallocation (blount)
* Add malloc_hook_test (llib)
* Change the objcopy -W test to be cross-friendly (mcgrathr)
* Export __tcmalloc in addition to _tcmalloc, for 86_64 (csilvers)
git-svn-id: http://gperftools.googlecode.com/svn/trunk@109 6b5cf1ce-ec42-a296-1ba9-69fdba395a50