== 16 September 2024 gperftools 2.16rc is out! This release doesn't have major fixes or big headline features, but it has quite a lot of internal modernizations and cleanups. By the number of commits, 2.16 is going to be our biggest release ever. This release's main focus was making our code and building infrastructure simpler, more straightforward, more portable, and more modern. Please note that the gperftools 2.16 release will be the last release with the heap leak checker included. The time has come to drop this feature entirely. All users should migrate to relevant gcc/clang sanitizers. Here are the most notable changes: * we've upgraded our C++ standard to C++ 17. Some fraction of our code base was modernized. * We've integrated (vendored copy of) GoogleTest, and most tests now use it. GoogleTest has helped us eliminate some legacy code and reduce the number of tests that use shell scripts. * There are no more unnecessary wrappers around mutexes and threads for unit tests. We now use C++ standard mutexes and threads in our tests. * We've done the bulk of the work necessary to enable hidden visibility. The most significant change is that tests no longer reach into libtcmalloc's guts. We use a special TestingPortal interface instead. We now offer the --enable-hidden-visibility configure option, which does what it says. But please note that hidden visibility is off by default for now. * autotools build was significantly refactored, modernized and simplified. * The cmake build has also been radically simplified. The previous version attempted to duplicate the same complexity that we had in the autotools build and did not do it very well. More tests now pass under cmake. But please note that cmake support is still not entirely functional, and we're not yet able to promise anything about it. * Thread-local storage access and emergency malloc integration have been reworked. We now support emergency malloc even on systems with emutls and similarly "bad" TLS support. As a result, backtracing is now more reliable (e.g., on QNX). * OSX operator new/delete performance has been improved. OSX's malloc performance is badly compromised by its support of malloc zones, so we cannot help much (the same applies to much of our competition among memory allocators). But the C++ new/delete API doesn't have to integrate with this stuff, so we now directly replace those functions for a sizeable speedup. Note that OSX performance is still not on par with other "prime tier" OSes due to its lack of efficient TLS support. * Long deprecated google/ headers have been deleted (use, e.g., "gperftools/tcmalloc.h" instead) * All clang builds now use -Wthread-safety and actually check thread-safety declarations * Our code has stopped being incompatible with _TIME_BITS=64 on modern GNU Linux systems (relevant only for 32-bit systems) * OpenSolaris build has been verified and fixed when needed Thanks to the following people for code contributions: * Github user oPiZiL (build fix for gcc 7.5) * Github user zhangdexin (qnx fixes) * Ishant Goyal (support for configuring minimal per-thread cache size) * Lennox Ho (several build fixes and several fixes around Windows support) * Olivier Langlois * Sergey Fedorovhas (another fix for building gperftools on old PPC OSX computers) * Xiang.Lin (several OSX fixes) * Yikai Zhao (aarch64 generic_fp stack frame validation) You can find the list of all GitHub issues fixes in this release here: https://github.com/gperftools/gperftools/issues?q=label%3Afixed-in-2.16+is%3Aclosed == 5 Jan 2024 gperftools 2.15 is out! This release has the following bug fixes: * Xiaowei Wang has pointed out the pthread linking issue on cmake on older glibcs (where -pthread is not implicit). See https://github.com/gperftools/gperftools/pull/1473 for more details. * Mikael Simberg and Tom "spot" Callaway have pointed out the missing symbols issue when linking PPC or i386 builds. https://github.com/gperftools/gperftools/issues/1474 has all the details. Huge thanks to all contributors! == 31 Dec 2023 gperftools 2.14 is out! This release has the following set of notable changes: * Roman Geissler has contributed a fix to nasty initialization bug introduced in 2.13 (see github issue #1452 for one example where it fails). * spinlock delay support now has proper windows support. Instead of simply sleeping, it uses WaitOnAddress (which is basically windows equivalent of futexes). This improvement was contributed by Lennox Ho. * we now have basic QNX support (basic malloc + heap profiler) championed by Xiang.Lin. Thanks! Do note, however, that QNX doesn't provide SIGPROF ticks, so there will be no cpu profiler support on this OS. * Yikai Zhao has contributed several fixes to important corner cases of generic_fp stacktrace method. * several people have contributed various improvements to our cmake build: Lennox Ho, Sergey Fedorov, Mateusz Jakub Fila. But do note that cmake build is still incomplete and best-effort. * Julian Schroeder have fixed generic_fp incompatibility with ARM pointer auth. * Mateusz Jakub Fila has contributed implementation of mallocinfo2 function (64-bit version of mallinfo). * Lennox Ho has updated C malloc extension shims to include {Set,Get}MemoryReleaseRate. * Lennox Ho has contributed the ability to disable malloc functions patching on windows when TCMALLOC_DISABLE_REPLACEMENT=1 environment variable is set. * User poljak181 has contributed a fix to infinite recursion in some cases of malloc hooks (or user-replaced operator new) and MallocExtension::instance(). * Sergey Fedorov has contributed a fix to use MAP_ANON on some older OSes without MAP_ANONYMOUS. * the way we detect working ucontext->pc extraction method was reworked and is now fully compile-time as opposed to config-time. This means no more duplication and mismatches between autoconf and cmake bits in this area. List of relevant tickets can be seen online at: https://github.com/gperftools/gperftools/issues?q=label%3Afixed-in-2.14+ == 11 Sep 2023 gperftools 2.13 is out! This release includes a few minor fixes: * Ivan Dlugos has fixed some issues with cmake and config.h defines. * 32-bit builds no longer require 64-bit atomics (which we wrongly introduced in 2.11 and which broke builds on some 32-bit architectures). * generic_fp backtracing method now uses robust address probing method. The previous approach had occasional false positives, which caused occasional rare crashes. * In some cases, MSVC generated TrivialOnce machine code that deadlocked programs on startup. The issue is now fixed. == 24 Aug 2023 gperftools 2.12 is out! Brett T. Warden contributed one significant fix. After a change in the previous release, we installed broken pkg-config files. Brett noticed and fixed that. Huge thanks! == 14 Aug 2023 gperftools 2.11 is out! Few minor fixes since rc couple weeks ago. Plus couple notable contributions: * Artem Polyakov has contributed auto-detection of several MPI systems w.r.t. filenames used by HEAPPROFILE and CPUPROFILE environment variables. Also, we now support HEAPPROFILE_USE_PID and CPUPROFILE_USE_PID environment variables that force profile filenames to have pid appended. Which will be useful for some programs that fork for parallelism. See https://github.com/gperftools/gperftools/pull/1263 for details. * Ken Raffenetti has extended MPI detection mentioned above with detection of MPICH system. Thanks a lot! == 31 July 2023 gperftools 2.11rc is out! Most notable change is that Linux/aarch64 and Linux/riscv are now fully supported. That is, all unit tests pass on those architectures (previously the heap leak checker was broken). Also notable is that heap leak checker support is officially deprecated as of this release. All bug fixes from now are on a best effort basis. For clarity we also declare that it is only expected to work (for some definition of work) on Linux/x86 (all kinds), Linux/aarch64, Linux/arm, Linux/ppc (untested as of this writing) and Linux/mips (untested as well). While some functionality worked in the past on BSDs, it was never fully functional; and will never be. We strongly recommend everyone to switch to asan and friends. For major internal changes it is also worth mentioning that we now fully switched to C++-11 std::atomic. All custom OS- and arch-specific atomic bits have been removed at last. Another notable change is that mmap and sbrk hooks facility is now no-op. We keep API and ABI for formal compatibility, but the calls to add mmap/sbrk hooks do nothing and return an error (whenever possible as part of API). There seem to be no users of it anyways, and mmap replacement API that is part of that facility really screwed up 64-bit offsets on (some/most) 32-bit systems. Internally for heap profiler and heap checker we have a new, but non-public API (see mmap_hook.h). Most tests now pass on NetBSD x86-64 (I tested on version 9.2). And only one that fails is new stacktrace test for stacktraces from signal handler (so there could be some imperfections for cpu profiles). We don't warn people away from the libgcc stacktrace capturing method anymore. In fact users on most recent glibc-s are advised to use it (pass --enable-libgcc-unwinder-by-default). This is thanks to the dl_find_object API offered by glibc which allows this implementation to be fully async-signal-safe. Modern Linux distros should from now on build their gperftools package with this enabled (other than those built on top of musl). generic_fp and generic_fp_unsafe stacktrace capturing methods have been expanded for more architectures and even some basic non-Linux support. We have completely removed old x86-specific frame pointer stacktrace implementation in favor of those 2. _unsafe one should be roughly equivalent to the old x86 method. And 'safe' one is recommended as a new default for those who want FP-based stacktracing. Safe implementation robustly checks memory before accessing it, preventing unlikely, but not impossible crashes when frame pointers are bogus. On platforms that support it, we now build gperftools with "-fno-omit-frame-pointer -momit-leaf-frame-pointer". This makes gperftools mostly frame-pointer-ful, but without performance hit in places that matter (this is how Google builds their binaries BTW). That should cover gcc (at least) on x86, aarch64 and riscv. Intention for this change is to make distro-shipped libtcmalloc.so compatible with frame-pointer stacktrace capturing (for those who still do heap profiling, for example). Of course, passing --enable-frame-pointers still gives you full frame pointers (i.e. even for leaf functions). There is now support for detecting actual page size at runtime. tcmalloc will now allocate memory in units of this page size. It particularly helps on arms with 64k pages to return memory back to the kernel. But it is somewhat controversial, because it effectively bumps tcmalloc logical page size on those machines potentially increasing fragmentation. In any case, there is now a new environment variable TCMALLOC_OVERRIDE_PAGESIZE allowing people to override this check. I.e. to either reduce effective page size down to tcmalloc's logical page size or to increase it. MallocExtension::MarkThreadTemporarilyIdle has been changed to be identical to MarkThreadIdle. MarkThreadTemporarilyIdle is believed to be unused, anyways. See issue #880 for details. There are a whole bunch of smaller fixes. Many of those smaller fixes had no associated ticket, but some had. People are advised to see here for list of notable tickets closed in this release: https://github.com/gperftools/gperftools/issues?q=label%3Afixed-in-2.11+ Some of those tickets are quite notable (fixes for rare deadlocks in cpu profiler ProfilerStop or while capturing heap growth stacktraces (aka growthz)). Here is list of notable contributions: * Chris Cambly has contributed initial support for AIX * Ali Saidi has contributed SpinlockPause implementation for aarch64 * Henrik Reinstädtler has contributed fix for cpuprofiler on aarch64 OSX * Gabriel Marin has backported Chromium's commit for always sanity checking large frees * User zhangyiru has contributed a fix to report the number of leaked bytes as size_t instead of (usually 32-bit) int. * Sergey Fedorov has contributed some fix for building on older ppc-based OSX-es * User tigeran has removed unused using declaration Huge thanks to all contributors. == 30 May 2022 == gperftools 2.10 is out! Here are notable changes: * Matt T. Proud contributed documentation fix to call Go programming language by it's true name instead of golang. * Robert Scott contributed debugallocator feature to use readable (PROT_READ) fence pages. This is activated by TCMALLOC_PAGE_FENCE_READABLE environment veriable. * User stdpain contributed fix for cmake detection of libunwind. * Natale Patriciello contributed fix for OSX Monterey support. * Volodymyr Nikolaichuk contributed support for returning memory back to OS by using mmap with MAP_FIXED and PROT_NONE. It is off by default and enabled by preprocessor define: FREE_MMAP_PROT_NONE. This should help OSes that don't support Linux-style madvise MADV_DONTNEED or BSD-style MADV_FREE. * Jingyun Hua has contributed basic support for LoongArch. * Github issue #1338 of failing to build on some recent musl versions has been fixed. * Github issue #1321 of failing to ship cmake bits with .tar.gz archive has been fixed. == 2 March 2021 == gperftools 2.9.1 is out! Minor fixes landed since previous release: * OSX builds new prefer backtrace() and have somewhat working heap sampling. * Incorrect assertion failure was fixed that crashed tcmalloc if assertions were on and sized delete was used. More details in github issue #1254. == 21 February 2021 == gperftools 2.9 is out! Few more changes landed compared to rc: * Venkatesh Srinivas has contributed thread-safety annotations support. * couple more unit test bugs that caused tcmalloc_unittest to fail on recent clang has been fixed. * usage of unsupportable linux_syscall_support.h has been removed from few places. Building with --disable-heap-checker now completely avoids it. Expect complete death of this header in next major release. == 14 February 2021 == gperftools 2.9rc is out! Here are notable changes: * Jarno Rajahalme has contributed fix for crashing bug in syscalls support for aarch64. * User SSE4 has contributed basic support for Elbrus 2000 architecture (!) * Venkatesh Srinivas has contributed cleanup to atomic ops. * Đoàn Trần Công Danh has fixed cpu profiler compilation on musl. * there is now better backtracing support for aarch64 and riscv. x86-64 with frame pointers now also defaults to this new "generic" frame pointer backtracer. * emergency malloc is now enabled by default. Fixes hang on musl when libgcc backtracer is enabled. * bunch of legacy config tests has been removed == 20 December 2020 == gperftools 2.8.1 is out! Here are notable changes: * previous release contained change to release memory without page heap lock, but this change had at least one bug that caused to crashes and corruption when running under aggressive decommit mode (this is not default). While we check for other bugs, this feature was reverted. See github issue #1204 and issue #1227. * stack traces depth captured by gperftools is now up to 254 levels deep. Thanks to Kerrick Staley for this small but useful tweak. * Levon Ter-Grigoryan has contributed small fix for compiler warning. * Grant Henke has contributed updated detection of program counter register for OS X on arm64. * Tim Gates has contributed small typo fix. * Steve Langasek has contributed basic build fixes for riscv64 (!). * Isaac Hier and okhowang have contributed premiliminary port of build infrastructure to cmake. This works, but it is very premiliminary. Autotools-based build is the only officially supported build for now. == 6 July 2020 == gperftools 2.8 is out! Here are notable changes: * ProfilerGetStackTrace is now officially supported API for libprofiler. Contributed by Kirill Müller. * Build failures on mingw were fixed. This fixed issue #1108. * Build failure of page_heap_test on MSVC was fixed. * Ryan Macnak contributed fix for compiling linux syscall support on i386 and recent GCCs. This fixed issue #1076. * test failures caused by new gcc 10 optimizations were fixed. Same change also fixed tests on clang. == 8 Mar 2020 == gperftools 2.8rc is out! Here are notable changes: * building code now requires c++11 or later. Bundled MSVC project was converted to Visual Studio 2015. * User obones contributed fix for windows x64 TLS callbacks. This fixed leak of thread caches on thread exists in 64-bit windows. * releasing memory back to kernel is now made with page heap lock dropped. * HoluWu contributed fix for correct malloc patching on debug builds on windows. This configuration previously crashed. * Romain Geissler contributed fix for tls access during early tls initialization on dlopen. * large allocation reports are now silenced by default. Since not all programs want their stderr polluted by those messages. Contributed by Junhao Li. * HolyWu contributed improvements to MSVC project files. Notably, there is now project for "overriding" version of tcmalloc. * MS-specific _recalloc is now correctly zeroing only malloced part. This fix was contributed by HolyWu. * Brian Silverman contributed correctness fix to sampler_test. * Gabriel Marin ported few fixes from chromium's fork. As part of those fixes, we reduced number of static initializers (forbidden in chromium). Also we now syscalls via syscall function instead of reimplementing direct way to make syscalls on each platform. * Brian Silverman fixed flakiness in page heap test. * There is now configure flag to skip installing perl pprof, since external golang pprof is much superior. --disable-deprecated-pprof is the flag. * Fabric Fontaine contributed fixes to drop use of nonstandard __off64_t type. * Fabrice Fontaine contributed build fix to check for presence of nonstandard __sbrk functions. It is only used by mmap hooks code and (rightfully) not available on musl. * Fabrice Fontaine contributed build fix around mmap64 macro and function conflict in same cases. * there is now configure time option to enable aggressive decommit by default. Contributed by Laurent Stacul. --enable-aggressive-decommit-by-default is the flag. * Tulio Magno Quites Machado Filho contributed build fixes for ppc around ucontext access. * User pkubaj contributed couple build fixes for FreeBSD/ppc. * configure now always assumes we have mmap. This fixes configure failures on some linux guests inside virtualbox. This fixed issue #1008. * User shipujin contributed syscall support fixes for mips64 (big and little endian). * Henrik Edin contributed configurable support for wide range of malloc page sizes. 4K, 8K, 16K, 32K, 64K, 128K and 256K are now supported via existing --with-tcmalloc-pagesize flag to configure. * Jon Kohler added overheads fields to per-size-class textual stats. Stats that are available via MallocExtension::instance()->GetStats(). * tcmalloc can now avoid fallback from memfs to default sys allocator. TCMALLOC_MEMFS_DISABLE_FALLBACK switches this on. This was contributed by Jon Kohler. * Ilya Leoshkevich fixed mmap syscall support on s390. * Todd Lipcon contributed small build warning fix. * User prehistoricpenguin contributed misc source file mode fixes (we still had few few c++ files marked executable). * User invalid_ms_user contributed fix for typo. * Jakub Wilk contributed typos fixes. == 29 Apr 2018 == gperftools 2.7 is out! Few people contributed minor, but important fixes since rc. Changes: * bug in span stats printing introduced by new scalable page heap change was fixed. * Christoph Müllner has contributed couple warnings fixes and initial support for aarch64_ilp32 architecture. * Ben Dang contributed documentation fix for heap checker. * Fabrice Fontaine contributed fixed for linking benchmarks with --disable-static. * Holy Wu has added sized deallocation unit tests. * Holy Wu has enabled support of sized deallocation (c++14) on recent MSVC. * Holy Wu has fixed MSVC build in WIN32_OVERRIDE_ALLOCATORS mode. This closed issue #716. * Holy Wu has contributed cleanup of config.h used on windows. * Mao Huang has contributed couple simple tcmalloc changes from chromium code base. Making our tcmalloc forks a tiny bit closer. * issue #946 that caused compilation failures on some Linux clang installations has been fixed. Much thanks to github user htuch for helping to diagnose issue and proposing a fix. * Tulio Magno Quites Machado Filho has contributed build-time fix for PPC (for problem introduced in one of commits since RC). == 18 Mar 2018 == gperftools 2.7rc is out! Changes: * Most notable change in this release is that very large allocations (>1MiB) are now handled be O(log n) implementation. This is contributed by Todd Lipcon based on earlier work by Aliaksei Kandratsenka and James Golick. Special thanks to Alexey Serbin for contributing OSX fix for that commit. * detection of sized deallocation support is improved. Which should fix another set of issues building on OSX. Much thanks to Alexey Serbin for reporting the issue, suggesting a fix and verifying it. * Todd Lipcon made a change to extend page heaps freelists to 1 MiB (up from 1MiB - 8KiB). This may help a little for some workloads. * Ishan Arora contributed typo fix to docs == 9 Dec 2017 == gperftools 2.6.3 is out! Just two fixes were made in this release: * Stephan Zuercher has contributed a build fix for some recent XCode versions. See issue #942 for more details. * assertion failure on some windows builds introduced by 2.6.2 was fixed. Thanks to github user nkeemik for reporting it and testing fix. See issue #944 for more details. == 30 Nov 2017 == gperftools 2.6.2 is out! Most notable change is recently added support for C++17 over-aligned allocation operators contributed by Andrey Semashev. I've extended his implemention to have roughly same performance as malloc/new. This release also has native support for C11 aligned_alloc. Rest is mostly bug fixes: * Jianbo Yang has contributed a fix for potentially severe data race introduced by malloc fast-path work in gperftools 2.6. This race could cause occasional violation of total thread cache size constraint. See issue #929 for more details. * Correct behavior in out-of-memory condition in fast-path cases was restored. This was another bug introduced by fast-path optimization in gperftools 2.6 which caused operator new to silently return NULL instead of doing correct C++ OOM handling (calling new_handler and throwing bad_alloc). * Khem Raj has contributed couple build fixes for newer glibcs (ucontext_t vs struct ucontext and loff_t definition) * Piotr Sikora has contributed build fix for OSX (not building unwind benchmark). This was issue #910 (thanks to Yuriy Solovyov for reporting it). * Dorin Lazăr has contributed fix for compiler warning * issue #912 (occasional deadlocking calling getenv too early on windows) was fixed. Thanks to github user shangcangriluo for reporting it. * Couple earlier lsan-related commits still causing occasional issues linking on OSX has been reverted. See issue #901. * Volodimir Krylov has contributed GetProgramInvocationName for FreeBSD * changsu lee has contributed couple minor correctness fixes (missing va_end() and missing free() call in rarely executed Symbolize path) * Andrew C. Morrow has contributed some more page heap stats. See issue #935. * some cases of built-time warnings from various gcc/clang versions about throw() declarations have been fixes. == 9 July 2017 == gperftools 2.6.1 is out! This is mostly bug-fixes release. * issue #901: build issue on OSX introduced in last-time commit in 2.6 was fixed (contributed by Francis Ricci) * tcmalloc_minimal now works on 32-bit ABI of mips64. This is issue #845. Much thanks to Adhemerval Zanella and github user mtone. * Romain Geissler contributed build fix for -std=c++17. This is pull request #897. * As part of fixing issue #904, tcmalloc atfork handler is now installed early. This should fix slight chance of hitting deadlocks at fork in some cases. == 4 July 2017 == gperftools 2.6 is out! * Kim Gräsman contributed documentation update for HEAPPROFILESIGNAL environment variable * KernelMaker contributed fix for population of min_object_size field returned by MallocExtension::GetFreeListSizes * commit 8c3dc52fcfe0 "issue-654: [pprof] handle split text segments" was reverted. Some OSX users reported issues with this commit. Given our pprof implementation is strongly deprecated it is best to drop recently introduced features rather than breaking it badly. * Francis Ricci contributed improvement for interaction with leak sanitizer. == 22 May 2017 == gperftools 2.6rc4 is out! Dynamic sized delete is disabled by default again. There is no hope of it working with eager dynamic symbols resolution (-z now linker flag). More details in https://bugzilla.redhat.com/show_bug.cgi?id=1452813 == 21 May 2017 == gperftools 2.6rc3 is out! gperftools compilation on older systems (e.g. rhel 5) was fixed. This was originally reported in github issue #888. == 14 May 2017 == gperftools 2.6rc2 is out! Just 2 small fixes on top of 2.6rc. Particularly, Rajalakshmi Srinivasaraghavan contributed build fix for ppc32. == 14 May 2017 == gperftools 2.6rc is out! Highlights of this release are performance work on malloc fast-path and support for more modern visual studio runtimes, and deprecation of bundled pprof. Another significant performance-affecting changes are reverting central free list transfer batch size back to 32 and disabling of aggressive decommit mode by default. Note, while we still ship perl implementation of pprof, everyone is strongly advised to use golang reimplementation of pprof from https://github.com/google/pprof. Here are notable changes in more details (and see ChangeLog for full details): * a bunch of performance tweaks to tcmalloc fast-path were merged. This speeds up critical path of tcmalloc by few tens of %. Well tuned and allocation-heavy programs should see substantial performance boost (should apply to all modern elf platforms). This is based on Google-internal tcmalloc changes for fast-path (with obvious exception of lacking per-cpu mode, of course). Original changes were made by Aliaksei Kandratsenka. And Andrew Hunter, Dmitry Vyukov and Sanjay Ghemawat contributed with reviews and discussions. * Architectures with 48 bits address space (x86-64 and aarch64) now use faster 2 level page map. This was ported from Google-internal change by Sanjay Ghemawat. * Default value of TCMALLOC_TRANSFER_NUM_OBJ was returned back to 32. Larger values have been found to hurt certain programs (but help some other benchmarks). Value can still be tweaked at run time via environment variable. * tcmalloc aggressive decommit mode is now disabled by default again. It was found to degrade performance of certain tensorflow benchmarks. Users who prefer smaller heap over small performance win can still set environment variable TCMALLOC_AGGRESSIVE_DECOMMIT=t. * runtime switchable sized delete support has be fixed and re-enabled (on GNU/Linux). Programs that use C++ 14 or later that use sized delete can again be sped up by setting environment variable TCMALLOC_ENABLE_SIZED_DELETE=t. Support for enabling sized deallication support at compile-time is still present, of course. * tcmalloc now explicitly avoids use of MADV_FREE on Linux, unless TCMALLOC_USE_MADV_FREE is defined at compile time. This is because performance impact of MADV_FREE is not well known. Original issue #780 raised by Mathias Stearn. * issue #786 with occasional deadlocks in stack trace capturing via libunwind was fixed. It was originally reported as Ceph issue: http://tracker.ceph.com/issues/13522 * ChangeLog is now automatically generated from git log. Old ChangeLog is now ChangeLog.old. * tcmalloc now provides implementation of nallocx. Function was originally introduced by jemalloc and can be used to return real allocation size given allocation request size. This is ported from Google-internal tcmalloc change contributed by Dmitry Vyukov. * issue #843 which made tcmalloc crash when used with erlang runtime was fixed. * issue #839 which caused tcmalloc's aggressive decommit mode to degrade performance in some corner cases was fixed. * Bryan Chan contributed support for 31-bit s390. * Brian Silverman contributed compilation fix for 32-bit ARMs * Issue #817 that was causing tcmalloc to fail on windows 10 and later, as well as on recent msvc was fixed. We now patch _free_base as well. * a bunch of minor documentaion/typos fixes by: Mike Gaffney , iivlev , savefromgoogle , John McDole , zmertens , Kirill Müller , Eugene , Ola Olsson , Mostyn Bramley-Moore * Tulio Magno Quites Machado Filho has contributed removal of deprecated glibc malloc hooks. * Issue #827 that caused intercepting malloc on osx 10.12 to fail was fixed, by copying fix made by Mike Hommey to jemalloc. Much thanks to Koichi Shiraishi and David Ribeiro Alves for reporting it and testing fix. * Aman Gupta and Kenton Varda contributed minor fixes to pprof (but note again that pprof is deprecated) * Ryan Macnak contributed compilation fix for aarch64 * Francis Ricci has fixed unaligned memory access in debug allocator * TCMALLOC_PAGE_FENCE_NEVER_RECLAIM now actually works thanks to contribution by Andrew Morrow. == 12 Mar 2016 == gperftools 2.5 is out! Just single bugfix was merged after rc2. Which was fix for issue #777. == 5 Mar 2016 == gperftools 2.5rc2 is out! New release contains just few commits on top of first release candidate. One of them is build fix for Visual Studio. Another significant change is that dynamic sized delete is now disabled by default. It turned out that IFUNC relocations are not supporting our advanced use case on all platforms and in all cases. == 21 Feb 2016 == gperftools 2.5rc is out! Here are major changes since 2.4: * we've moved to github! * Bryan Chan has contributed s390x support * stacktrace capturing via libgcc's _Unwind_Backtrace was implemented (for architectures with missing or broken libunwind). * "emergency malloc" was implemented. Which unbreaks recursive calls to malloc/free from stacktrace capturing functions (such us glib'c backtrace() or libunwind on arm). It is enabled by --enable-emergency-malloc configure flag or by default on arm when --enable-stacktrace-via-backtrace is given. It is another fix for a number common issues people had on platforms with missing or broken libunwind. * C++14 sized-deallocation is now supported (on gcc 5 and recent clangs). It is off by default and can be enabled at configure time via --enable-sized-delete. On GNU/Linux it can also be enabled at run-time by either TCMALLOC_ENABLE_SIZED_DELETE environment variable or by defining tcmalloc_sized_delete_enabled function which should return 1 to enable it. * we've lowered default value of transfer batch size to 512. Previous value (bumped up in 2.1) was too high and caused performance regression for some users. 512 should still give us performance boost for workloads that need higher transfer batch size while not penalizing other workloads too much. * Brian Silverman's patch finally stopped arming profiling timer unless profiling is started. * Andrew Morrow has contributed support for obtaining cache size of the current thread and softer idling (for use in MongoDB). * we've implemented few minor performance improvements, particularly on malloc fast-path. A number of smaller fixes were made. Many of them were contributed: * issue that caused spurious profiler_unittest.sh failures was fixed. * Jonathan Lambrechts contributed improved callgrind format support to pprof. * Matt Cross contributed better support for debug symbols in separate files to pprof. * Matt Cross contributed support for printing collapsed stack frame from pprof aimed at producing flame graphs. * Angus Gratton has contributed documentation fix mentioning that on windows only tcmalloc_minimal is supported. * Anton Samokhvalov has made tcmalloc use mi_force_{un,}lock on OSX instead of pthread_atfork. Which apparently fixes forking issues tcmalloc had on OSX. * Milton Chiang has contributed support for building 32-bit gperftools on arm8. * Patrick LoPresti has contributed support for specifying alternative profiling signal via CPUPROFILE_TIMER_SIGNAL environment variable. * Paolo Bonzini has contributed support configuring filename for sending malloc tracing output via TCMALLOC_TRACE_FILE environment variable. * user spotrh has enabled use of futex on arm. * user mitchblank has contributed better declaration for arg-less profiler functions. * Tom Conerly contributed proper freeing of memory allocated in HeapProfileTable::FillOrderedProfile on error paths. * user fdeweerdt has contributed curl arguments handling fix in pprof * Frederik Mellbin fixed tcmalloc's idea of mangled new and delete symbols on windows x64 * Dair Grant has contributed cacheline alignment for ThreadCache objects * Fredrik Mellbin has contributed updated windows/config.h for Visual Studio 2015 and other windows fixes. * we're not linking libpthread to libtcmalloc_minimal anymore. Instead libtcmalloc_minimal links to pthread symbols weakly. As a result single-threaded programs remain single-threaded when linking to or preloading libtcmalloc_minimal.so. * Boris Sazonov has contributed mips compilation fix and printf misue in pprof. * Adhemerval Zanella has contributed alignment fixes for statically allocated variables. * Jens Rosenboom has contributed fixes for heap-profiler_unittest.sh * gshirishfree has contributed better description for GetStats method. * cyshi has contributed spinlock pause fix. * Chris Mayo has contributed --docdir argument support for configure. * Duncan Sands has contributed fix for function aliases. * Simon Que contributed better include for malloc_hook_c.h * user wmamrak contributed struct timespec fix for Visual Studio 2015. * user ssubotin contributed typo in PrintAvailability code. == 10 Jan 2015 == gperftools 2.4 is out! The code is exactly same as 2.4rc. == 28 Dec 2014 == gperftools 2.4rc is out! Here are changes since 2.3: * enabled aggressive decommit option by default. It was found to significantly improve memory fragmentation with negligible impact on performance. (Thanks to investigation work performed by Adhemerval Zanella) * added ./configure flags for tcmalloc pagesize and tcmalloc allocation alignment. Larger page sizes have been reported to improve performance occasionally. (Patch by Raphael Moreira Zinsly) * sped-up hot-path of malloc/free. By about 5% on static library and about 10% on shared library. Mainly due to more efficient checking of malloc hooks. * improved stacktrace capturing in cpu profiler (due to issue found by Arun Sharma). As part of that issue pprof's handling of cpu profiles was also improved. == 7 Dec 2014 == gperftools 2.3 is out! Here are changes since 2.3rc: * (issue 658) correctly close socketpair fds on failure (patch by glider) * libunwind integration can be disabled at configure time (patch by Raphael Moreira Zinsly) * libunwind integration is disabled by default for ppc64 (patch by Raphael Moreira Zinsly) * libunwind integration is force-disabled for OSX. It was not used by default anyways. Fixes compilation issue I saw. == 2 Nov 2014 == gperftools 2.3rc is out! Most small improvements in this release were made to pprof tool. New experimental Linux-only (for now) cpu profiling mode is a notable big improvement. Here are notable changes since 2.2.1: * (issue-631) fixed debugallocation miscompilation on mmap-less platforms (courtesy of user iamxujian) * (issue-630) reference to wrong PROFILE (vs. correct CPUPROFILE) environment variable was fixed (courtesy of WenSheng He) * pprof now has option to display stack traces in output for heap checker (courtesy of Michael Pasieka) * (issue-636) pprof web command now works on mingw * (issue-635) pprof now handles library paths that contain spaces (courtesy of user mich...@sebesbefut.com) * (issue-637) pprof now has an option to not strip template arguments (patch by jiakai) * (issue-644) possible out-of-bounds access in GetenvBeforeMain was fixed (thanks to user abyss.7) * (issue-641) pprof now has an option --show_addresses (thanks to user yurivict). New option prints instruction address in addition to function name in stack traces * (issue-646) pprof now works around some issues of addr2line reportedly when DWARF v4 format is used (patch by Adam McNeeney) * (issue-645) heap profiler exit message now includes remaining memory allocated info (patch by user yurivict) * pprof code that finds location of /proc//maps in cpu profile files is now fixed (patch by Ricardo M. Correia) * (issue-654) pprof now handles "split text segments" feature of Chromium for Android. (patch by simonb) * (issue-655) potential deadlock on windows caused by early call to getenv in malloc initialization code was fixed (bug reported and fix proposed by user zndmitry) * incorrect detection of arm 6zk instruction set support (-mcpu=arm1176jzf-s) was fixed. (Reported by pedronavf on old issue-493) * new cpu profiling mode on Linux is now implemented. It sets up separate profiling timers for separate threads. Which improves accuracy of profiling on Linux a lot. It is off by default. And is enabled if both librt.f is loaded and CPUPROFILE_PER_THREAD_TIMERS environment variable is set. But note that all threads need to be registered via ProfilerRegisterThread. == 21 Jun 2014 == gperftools 2.2.1 is out! Here's list of fixes: * issue-626 was closed. Which fixes initialization statically linked tcmalloc. * issue 628 was closed. It adds missing header file into source tarball. This fixes for compilation on PPC Linux. == 3 May 2014 == gperftools 2.2 is out! Here are notable changes since 2.2rc: * issue 620 (crash on windows when c runtime dll is reloaded) was fixed == 19 Apr 2014 == gperftools 2.2rc is out! Here are notable changes since 2.1: * a number of fixes for a number compilers and platforms. Notably Visual Studio 2013, recent mingw with c++ threads and some OSX fixes. * we now have mips and mips64 support! (courtesy of Jovan Zelincevic, Jean Lee, user xiaoyur347 and others) * we now have aarch64 (aka arm64) support! (contributed by Riku Voipio) * there's now support for ppc64-le (by Raphael Moreira Zinsly and Adhemerval Zanella) * there's now some support of uclibc (contributed by user xiaoyur347) * google/ headers will now give you deprecation warning. They are deprecated since 2.0 * there's now new api: tc_malloc_skip_new_handler (ported from chromium fork) * issue-557: added support for dumping heap profile via signal (by Jean Lee) * issue-567: Petr Hosek contributed SysAllocator support for windows * Joonsoo Kim contributed several speedups for central freelist code * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES environment variable now works * configure scripts are now using AM_MAINTAINER_MODE. It'll only affect folks who modify source from .tar.gz and want automake to automatically rebuild Makefile-s. See automake documentation for that. * issue-586: detect main executable even if PIE is active (based on patch by user themastermind1). Notably, it fixes profiler use with ruby. * there is now support for switching backtrace capturing method at runtime (via TCMALLOC_STACKTRACE_METHOD and TCMALLOC_STACKTRACE_METHOD_VERBOSE environment variables) * there is new backtrace capturing method using -finstrument-functions prologues contributed by user xiaoyur347 * few cases of crashes/deadlocks in profiler were addressed. See (famous) issue-66, issue-547 and issue-579. * issue-464 (memory corruption in debugalloc's realloc after memallign) is now fixed * tcmalloc is now able to release memory back to OS on windows (issue-489). The code was ported from chromium fork (by a number of authors). * Together with issue-489 we ported chromium's "aggressive decommit" mode. In this mode (settable via malloc extension and via environment variable TCMALLOC_AGGRESSIVE_DECOMMIT), free pages are returned back to OS immediately. * MallocExtension::instance() is now faster (based on patch by Adhemerval Zanella) * issue-610 (hangs on windows in multibyte locales) is now fixed The following people helped with ideas or patches (based on git log, some contributions purely in bugtracker might be missing): Andrew C. Morrow, yurivict, Wang YanQing, Thomas Klausner, davide.italiano@10gen.com, Dai MIKURUBE, Joon-Sung Um, Jovan Zelincevic, Jean Lee, Petr Hosek, Ben Avison, drussel, Joonsoo Kim, Hannes Weisbach, xiaoyur347, Riku Voipio, Adhemerval Zanella, Raphael Moreira Zinsly == 30 July 2013 == gperftools 2.1 is out! Just few fixes where merged after rc. Most notably: * Some fixes for debug allocation on POWER/Linux == 20 July 2013 == gperftools 2.1rc is out! As a result of more than a year of contributions we're ready for 2.1 release. But before making that step I'd like to create RC and make sure people have chance to test it. Here are notable changes since 2.0: * fixes for building on newer platforms. Notably, there's now initial support for x32 ABI (--enable-minimal only at this time)) * new getNumericProperty stats for cache sizes * added HEAP_PROFILER_TIME_INTERVAL variable (see documentation) * added environment variable to control heap size (TCMALLOC_HEAP_LIMIT_MB) * added environment variable to disable release of memory back to OS (TCMALLOC_DISABLE_MEMORY_RELEASE) * cpu profiler can now be switched on and off by sending it a signal (specified in CPUPROFILESIGNAL) * (issue 491) fixed race-ful spinlock wake-ups * (issue 496) added some support for fork-ing of process that is using tcmalloc * (issue 368) improved memory fragmentation when large chunks of memory are allocated/freed == 03 February 2012 == I've just released gperftools 2.0 The `google-perftools` project has been renamed to `gperftools`. I (csilvers) am stepping down as maintainer, to be replaced by David Chappelle. Welcome to the team, David! David has been an an active contributor to perftools in the past -- in fact, he's the only person other than me that already has commit status. I am pleased to have him take over as maintainer. I have both renamed the project (the Google Code site renamed a few weeks ago), and bumped the major version number up to 2, to reflect the new community ownership of the project. Almost all the [http://gperftools.googlecode.com/svn/tags/gperftools-2.0/ChangeLog changes] are related to the renaming. The main functional change from google-perftools 1.10 is that I've renamed the `google/` include-directory to be `gperftools/` instead. New code should `#include `/etc. (Most users of perftools don't need any perftools-specific includes at all, so this is mostly directed to "power users.") I've kept the old names around as forwarding headers to the new, so `#include ` will continue to work. (The other functional change which I snuck in is getting rid of some bash-isms in one of the unittest driver scripts, so it could run on Solaris.) Note that some internal names still contain the text `google`, such as the `google_malloc` internal linker section. I think that's a trickier transition, and can happen in a future release (if at all). === 31 January 2012 === I've just released perftools 1.10 There is an API-incompatible change: several of the methods in the `MallocExtension` class have changed from taking a `void*` to taking a `const void*`. You should not be affected by this API change unless you've written your own custom malloc extension that derives from `MallocExtension`, but since it is a user-visible change, I have upped the `.so` version number for this release. This release focuses on improvements to linux-syscall-support.h, including ARM and PPC fixups and general cleanups. I hope this will magically fix an array of bugs people have been seeing. There is also exciting news on the porting front, with support for patching win64 assembly contributed by IBM Canada! This is an important step -- perhaps the most difficult -- to getting perftools to work on 64-bit windows using the patching technique (it doesn't affect the libc-modification technique). `premable_patcher_test` has been added to help test these changes; it is meant to compile under x86_64, and won't work under win32. For the full list of changes, including improved `HEAP_PROFILE_MMAP` support, see the [http://gperftools.googlecode.com/svn/tags/google-perftools-1.10/ChangeLog ChangeLog]. === 24 January 2011 === The `google-perftools` Google Code page has been renamed to `gperftools`, in preparation for the project being renamed to `gperftools`. In the coming weeks, I'll be stepping down as maintainer for the perftools project, and as part of that Google is relinquishing ownership of the project; it will now be entirely community run. The name change reflects that shift. The 'g' in 'gperftools' stands for 'great'. :-) === 23 December 2011 === I've just released perftools 1.9.1 I missed including a file in the tarball, that is needed to compile on ARM. If you are not compiling on ARM, or have successfully compiled perftools 1.9, there is no need to upgrade. === 22 December 2011 === I've just released perftools 1.9 This change has a slew of improvements, from better ARM and freebsd support, to improved performance by moving some code outside of locks, to better pprof reporting of code with overloaded functions. The full list of changes is in the [http://google-perftools.googlecode.com/svn/tags/google-perftools-1.9/ChangeLog ChangeLog]. === 26 August 2011 === I've just released perftools 1.8.3 The star-crossed 1.8 series continues; in 1.8.1, I had accidentally removed some code that was needed for FreeBSD. (Without this code many apps would crash at startup.) This release re-adds that code. If you are not on FreeBSD, or are using FreeBSD with perftools 1.8 or earlier, there is no need to upgrade. === 11 August 2011 === I've just released perftools 1.8.2 I was incorrectly calculating the patch-level in the configuration step, meaning the TC_VERSION_PATCH #define in tcmalloc.h was wrong. Since the testing framework checks for this, it was failing. Now it should work again. This time, I was careful to re-run my tests after upping the version number. :-) If you don't care about the TC_VERSION_PATCH #define, there's no reason to upgrae. === 26 July 2011 === I've just released perftools 1.8.1 I was missing an #include that caused the build to break under some compilers, especially newer gcc's, that wanted it. This only affects people who build from source, so only the .tar.gz file is updated from perftools 1.8. If you didn't have any problems compiling perftools 1.8, there's no reason to upgrade. === 15 July 2011 === I've just released perftools 1.8 Of the many changes in this release, a good number pertain to porting. I've revamped OS X support to use the malloc-zone framework; it should now Just Work to link in tcmalloc, without needing `DYLD_FORCE_FLAT_NAMESPACE` or the like. (This is a pretty major change, so please feel free to report feedback at google-perftools@googlegroups.com.) 64-bit Windows support is also improved, as is ARM support, and the hooks are in place to improve FreeBSD support as well. On the other hand, I'm seeing hanging tests on Cygwin. I see the same hanging even with (the old) perftools 1.7, so I'm guessing this is either a problem specific to my Cygwin installation, or nobody is trying to use perftools under Cygwin. If you can reproduce the problem, and even better have a solution, you can report it at google-perftools@googlegroups.com. Internal changes include several performance and space-saving tweaks. One is user-visible (but in "stealth mode", and otherwise undocumented): you can compile with `-DTCMALLOC_SMALL_BUT_SLOW`. In this mode, tcmalloc will use less memory overhead, at the cost of running (likely not noticeably) slower. There are many other changes as well, too numerous to recount here, but present in the [http://google-perftools.googlecode.com/svn/tags/google-perftools-1.8/ChangeLog ChangeLog]. === 7 February 2011 === Thanks to endlessr..., who [http://code.google.com/p/google-perftools/issues/detail?id=307 identified] why some tests were failing under MSVC 10 in release mode. It does not look like these failures point toward any problem with tcmalloc itself; rather, the problem is with the test, which made some assumptions that broke under the some aggressive optimizations used in MSVC 10. I'll fix the test, but in the meantime, feel free to use perftools even when compiled under MSVC 10. === 4 February 2011 === I've just released perftools 1.7 I apologize for the delay since the last release; so many great new patches and bugfixes kept coming in (and are still coming in; I also apologize to those folks who have to slip until the next release). I picked this arbitrary time to make a cut. Among the many new features in this release is a multi-megabyte reduction in the amount of tcmalloc overhead uder x86_64, improved performance in the case of contention, and many many bugfixes, especially architecture-specific bugfixes. See the [http://google-perftools.googlecode.com/svn/tags/google-perftools-1.7/ChangeLog ChangeLog] for full details. One architecture-specific change of note is added comments in the [http://google-perftools.googlecode.com/svn/tags/perftools-1.7/README README] for using tcmalloc under OS X. I'm trying to get my head around the exact behavior of the OS X linker, and hope to have more improvements for the next release, but I hope these notes help folks who have been having trouble with tcmalloc on OS X. *Windows users*: I've heard reports that some unittests fail on Windows when compiled with MSVC 10 in Release mode. All tests pass in Debug mode. I've not heard of any problems with earlier versions of MSVC. I don't know if this is a problem with the runtime patching (so the static patching discussed in README_windows.txt will still work), a problem with perftools more generally, or a bug in MSVC 10. Anyone with windows expertise that can debug this, I'd be glad to hear from! === 5 August 2010 === I've just released perftools 1.6 This version also has a large number of minor changes, including support for `malloc_usable_size()` as a glibc-compatible alias to `malloc_size()`, the addition of SVG-based output to `pprof`, and experimental support for tcmalloc large pages, which may speed up tcmalloc at the cost of greater memory use. To use tcmalloc large pages, see the [http://google-perftools.googlecode.com/svn/tags/perftools-1.6/INSTALL INSTALL file]; for all changes, see the [http://google-perftools.googlecode.com/svn/tags/perftools-1.6/ChangeLog ChangeLog]. OS X NOTE: improvements in the profiler unittest have turned up an OS X issue: in multithreaded programs, it seems that OS X often delivers the profiling signal (from sigitimer()) to the main thread, even when it's sleeping, rather than spawned threads that are doing actual work. If anyone knows details of how OS X handles SIGPROF events (from setitimer) in threaded programs, and has insight into this problem, please send mail to google-perftools@googlegroups.com. To see if you're affected by this, look for profiling time that pprof attributes to `___semwait_signal`. This is work being done in other threads, that is being attributed to sleeping-time in the main thread. === 20 January 2010 === I've just released perftools 1.5 This version has a slew of changes, leading to somewhat faster performance and improvements in portability. It adds features like `ITIMER_REAL` support to the cpu profiler, and `tc_set_new_mode` to mimic the windows function of the same name. Full details are in the [http://google-perftools.googlecode.com/svn/tags/perftools-1.5/ChangeLog ChangeLog]. === 11 September 2009 === I've just released perftools 1.4 The major change this release is the addition of a debugging malloc library! If you link with `libtcmalloc_debug.so` instead of `libtcmalloc.so` (and likewise for the `minimal` variants) you'll get a debugging malloc, which will catch double-frees, writes to freed data, `free`/`delete` and `delete`/`delete[]` mismatches, and even (optionally) writes past the end of an allocated block. We plan to do more with this library in the future, including supporting it on Windows, and adding the ability to use the debugging library with your default malloc in addition to using it with tcmalloc. There are also the usual complement of bug fixes, documented in the ChangeLog, and a few minor user-tunable knobs added to components like the system allocator. === 9 June 2009 === I've just released perftools 1.3 Like 1.2, this has a variety of bug fixes, especially related to the Windows build. One of my bugfixes is to undo the weird `ld -r` fix to `.a` files that I introduced in perftools 1.2: it caused problems on too many platforms. I've reverted back to normal `.a` files. To work around the original problem that prompted the `ld -r` fix, I now provide `libtcmalloc_and_profiler.a`, for folks who want to link in both. The most interesting API change is that I now not only override `malloc`/`free`/etc, I also expose them via a unique set of symbols: `tc_malloc`/`tc_free`/etc. This enables clients to write their own memory wrappers that use tcmalloc: {{{ void* malloc(size_t size) { void* r = tc_malloc(size); Log(r); return r; } }}} === 17 April 2009 === I've just released perftools 1.2. This is mostly a bugfix release. The major change is internal: I have a new system for creating packages, which allows me to create 64-bit packages. (I still don't do that for perftools, because there is still no great 64-bit solution, with libunwind still giving problems and --disable-frame-pointers not practical in every environment.) Another interesting change involves Windows: a [http://code.google.com/p/google-perftools/issues/detail?id=126 new patch] allows users to choose to override malloc/free/etc on Windows rather than patching, as is done now. This can be used to create custom CRTs. My fix for this [http://groups.google.com/group/google-perftools/browse_thread/thread/1ff9b50043090d9d/a59210c4206f2060?lnk=gst&q=dynamic#a59210c4206f2060 bug involving static linking] ended up being to make libtcmalloc.a and libperftools.a a big .o file, rather than a true `ar` archive. This should not yield any problems in practice -- in fact, it should be better, since the heap profiler, leak checker, and cpu profiler will now all work even with the static libraries -- but if you find it does, please file a bug report. Finally, the profile_handler_unittest provided in the perftools testsuite (new in this release) is failing on FreeBSD. The end-to-end test that uses the profile-handler is passing, so I suspect the problem may be with the test, not the perftools code itself. However, I do not know enough about how itimers work on FreeBSD to be able to debug it. If you can figure it out, please let me know! === 11 March 2009 === I've just released perftools 1.1! It has many changes since perftools 1.0 including * Faster performance due to dynamically sized thread caches * Better heap-sampling for more realistic profiles * Improved support on Windows (MSVC 7.1 and cygwin) * Better stacktraces in linux (using VDSO) * Many bug fixes and feature requests Note: if you use the CPU-profiler with applications that fork without doing an exec right afterwards, please see the README. Recent testing has shown that profiles are unreliable in that case. The problem has existed since the first release of perftools. We expect to have a fix for perftools 1.2. For more details, see [http://code.google.com/p/google-perftools/issues/detail?id=105 issue 105]. Everyone who uses perftools 1.0 is encouraged to upgrade to perftools 1.1. If you see any problems with the new release, please file a bug report at http://code.google.com/p/google-perftools/issues/list. Enjoy!