diff --git a/NEWS b/NEWS index e69de29..064bd4b 100644 --- a/NEWS +++ b/NEWS @@ -0,0 +1,109 @@ +=== 20 January 2010 === + +I've just released perftools 1.5 + +This version has a slew of changes, leading to somewhat faster +performance and improvements in portability. It adds features like +`ITIMER_REAL` support to the cpu profiler, and `tc_set_new_mode` to +mimic the windows function of the same name. Full details are in the +[http://google-perftools.googlecode.com/svn/tags/perftools-1.5/ChangeLog +ChangeLog]. + +=== 11 September 2009 === + +I've just released perftools 1.4 + +The major change this release is the addition of a debugging malloc +library! If you link with `libtcmalloc_debug.so` instead of +`libtcmalloc.so` (and likewise for the `minimal` variants) you'll get +a debugging malloc, which will catch double-frees, writes to freed +data, `free`/`delete` and `delete`/`delete[]` mismatches, and even +(optionally) writes past the end of an allocated block. + +We plan to do more with this library in the future, including +supporting it on Windows, and adding the ability to use the debugging +library with your default malloc in addition to using it with +tcmalloc. + +There are also the usual complement of bug fixes, documented in the +ChangeLog, and a few minor user-tunable knobs added to components like +the system allocator. + + +=== 9 June 2009 === + +I've just released perftools 1.3 + +Like 1.2, this has a variety of bug fixes, especially related to the +Windows build. One of my bugfixes is to undo the weird `ld -r` fix to +`.a` files that I introduced in perftools 1.2: it caused problems on +too many platforms. I've reverted back to normal `.a` files. To work +around the original problem that prompted the `ld -r` fix, I now +provide `libtcmalloc_and_profiler.a`, for folks who want to link in +both. + +The most interesting API change is that I now not only override +`malloc`/`free`/etc, I also expose them via a unique set of symbols: +`tc_malloc`/`tc_free`/etc. This enables clients to write their own +memory wrappers that use tcmalloc: +{{{ + void* malloc(size_t size) { void* r = tc_malloc(size); Log(r); return r; } +}}} + + +=== 17 April 2009 === + +I've just released perftools 1.2. + +This is mostly a bugfix release. The major change is internal: I have +a new system for creating packages, which allows me to create 64-bit +packages. (I still don't do that for perftools, because there is +still no great 64-bit solution, with libunwind still giving problems +and --disable-frame-pointers not practical in every environment.) + +Another interesting change involves Windows: a +[http://code.google.com/p/google-perftools/issues/detail?id=126 new +patch] allows users to choose to override malloc/free/etc on Windows +rather than patching, as is done now. This can be used to create +custom CRTs. + +My fix for this +[http://groups.google.com/group/google-perftools/browse_thread/thread/1ff9b50043090d9d/a59210c4206f2060?lnk=gst&q=dynamic#a59210c4206f2060 +bug involving static linking] ended up being to make libtcmalloc.a and +libperftools.a a big .o file, rather than a true `ar` archive. This +should not yield any problems in practice -- in fact, it should be +better, since the heap profiler, leak checker, and cpu profiler will +now all work even with the static libraries -- but if you find it +does, please file a bug report. + +Finally, the profile_handler_unittest provided in the perftools +testsuite (new in this release) is failing on FreeBSD. The end-to-end +test that uses the profile-handler is passing, so I suspect the +problem may be with the test, not the perftools code itself. However, +I do not know enough about how itimers work on FreeBSD to be able to +debug it. If you can figure it out, please let me know! + +=== 11 March 2009 === + +I've just released perftools 1.1! + +It has many changes since perftools 1.0 including + + * Faster performance due to dynamically sized thread caches + * Better heap-sampling for more realistic profiles + * Improved support on Windows (MSVC 7.1 and cygwin) + * Better stacktraces in linux (using VDSO) + * Many bug fixes and feature requests + +Note: if you use the CPU-profiler with applications that fork without +doing an exec right afterwards, please see the README. Recent testing +has shown that profiles are unreliable in that case. The problem has +existed since the first release of perftools. We expect to have a fix +for perftools 1.2. For more details, see +[http://code.google.com/p/google-perftools/issues/detail?id=105 issue 105]. + +Everyone who uses perftools 1.0 is encouraged to upgrade to perftools +1.1. If you see any problems with the new release, please file a bug +report at http://code.google.com/p/google-perftools/issues/list. + +Enjoy! diff --git a/doc/heapprofile.html b/doc/heapprofile.html index c857df1..709559d 100644 --- a/doc/heapprofile.html +++ b/doc/heapprofile.html @@ -67,11 +67,12 @@ for a given run of an executable:

  • In your code, bracket the code you want profiled in calls to HeapProfilerStart() and HeapProfilerStop(). (These functions are declared in <google/heap-profiler.h>.) - HeapProfilerStart() will take - the profile-filename-prefix as an argument. You can then use - HeapProfilerDump() or - GetHeapProfile() to examine the profile. - In case it's useful, IsHeapProfilerRunning() will tell you + HeapProfilerStart() will take the + profile-filename-prefix as an argument. Then, as often as + you'd like before calling HeapProfilerStop(), you + can use HeapProfilerDump() or + GetHeapProfile() to examine the profile. In case + it's useful, IsHeapProfilerRunning() will tell you whether you've already called HeapProfilerStart() or not.

    diff --git a/packages/deb/control b/packages/deb/control index 379a5b1..c6f0a83 100644 --- a/packages/deb/control +++ b/packages/deb/control @@ -1,6 +1,6 @@ Source: google-perftools Priority: optional -Maintainer: Google Inc. +Maintainer: Google Inc. Build-Depends: debhelper (>= 4.0.0), binutils Standards-Version: 3.6.1 diff --git a/packages/rpm/rpm.spec b/packages/rpm/rpm.spec index b18e89b..bbf448f 100644 --- a/packages/rpm/rpm.spec +++ b/packages/rpm/rpm.spec @@ -10,7 +10,7 @@ Group: Development/Libraries URL: http://code.google.com/p/google-perftools/ License: BSD Vendor: Google -Packager: Google +Packager: Google Source: http://%{NAME}.googlecode.com/files/%{NAME}-%{VERSION}.tar.gz Distribution: Redhat 7 and above. Buildroot: %{_tmppath}/%{name}-root diff --git a/src/base/dynamic_annotations.c b/src/base/dynamic_annotations.c index 65c4158..cdefaa7 100644 --- a/src/base/dynamic_annotations.c +++ b/src/base/dynamic_annotations.c @@ -105,12 +105,7 @@ void AnnotateBenignRace(const char *file, int line, void AnnotateBenignRaceSized(const char *file, int line, const volatile void *mem, long size, - const char *description) { - long i; - for (i = 0; i < size; i++) { - AnnotateBenignRace(file, line, (char*)(mem) + i, description); - } -} + const char *description) {} void AnnotateMutexIsUsedAsCondVar(const char *file, int line, const volatile void *mu){} void AnnotateTraceMemory(const char *file, int line, @@ -121,6 +116,7 @@ void AnnotateIgnoreReadsBegin(const char *file, int line){} void AnnotateIgnoreReadsEnd(const char *file, int line){} void AnnotateIgnoreWritesBegin(const char *file, int line){} void AnnotateIgnoreWritesEnd(const char *file, int line){} +void AnnotateEnableRaceDetection(const char *file, int line, int enable){} void AnnotateNoOp(const char *file, int line, const volatile void *arg){} void AnnotateFlushState(const char *file, int line){} diff --git a/src/base/dynamic_annotations.h b/src/base/dynamic_annotations.h index 3980b24..dae1a14 100644 --- a/src/base/dynamic_annotations.h +++ b/src/base/dynamic_annotations.h @@ -246,6 +246,12 @@ ANNOTATE_IGNORE_READS_END();\ }while(0)\ + /* Enable (enable!=0) or disable (enable==0) race detection for all threads. + This annotation could be useful if you want to skip expensive race analysis + during some period of program execution, e.g. during initialization. */ + #define ANNOTATE_ENABLE_RACE_DETECTION(enable) \ + AnnotateEnableRaceDetection(__FILE__, __LINE__, enable) + /* ------------------------------------------------------------- Annotations useful for debugging. */ @@ -358,6 +364,7 @@ #define ANNOTATE_IGNORE_WRITES_END() /* empty */ #define ANNOTATE_IGNORE_READS_AND_WRITES_BEGIN() /* empty */ #define ANNOTATE_IGNORE_READS_AND_WRITES_END() /* empty */ + #define ANNOTATE_ENABLE_RACE_DETECTION(enable) /* empty */ #define ANNOTATE_NO_OP(arg) /* empty */ #define ANNOTATE_FLUSH_STATE() /* empty */ @@ -428,6 +435,7 @@ void AnnotateIgnoreReadsBegin(const char *file, int line); void AnnotateIgnoreReadsEnd(const char *file, int line); void AnnotateIgnoreWritesBegin(const char *file, int line); void AnnotateIgnoreWritesEnd(const char *file, int line); +void AnnotateEnableRaceDetection(const char *file, int line, int enable); void AnnotateNoOp(const char *file, int line, const volatile void *arg); void AnnotateFlushState(const char *file, int line); diff --git a/src/base/vdso_support.cc b/src/base/vdso_support.cc index ddaca37..fce7c2c 100644 --- a/src/base/vdso_support.cc +++ b/src/base/vdso_support.cc @@ -42,8 +42,8 @@ #include #include "base/atomicops.h" // for MemoryBarrier -#include "base/logging.h" #include "base/linux_syscall_support.h" +#include "base/logging.h" #include "base/dynamic_annotations.h" #include "base/basictypes.h" // for COMPILE_ASSERT diff --git a/src/central_freelist.cc b/src/central_freelist.cc index 674ff9b..5b7dfbb 100644 --- a/src/central_freelist.cc +++ b/src/central_freelist.cc @@ -266,8 +266,7 @@ void CentralFreeList::Populate() { Span* span; { SpinLockHolder h(Static::pageheap_lock()); - span = Static::pageheap()->New(npages); - if (span) Static::pageheap()->RegisterSizeClass(span, size_class_); + span = Static::pageheap()->New(npages, size_class_, kPageSize); } if (span == NULL) { MESSAGE("tcmalloc: allocation failed", npages << kPageShift); @@ -275,12 +274,6 @@ void CentralFreeList::Populate() { return; } ASSERT(span->length == npages); - // Cache sizeclass info eagerly. Locking is not necessary. - // (Instead of being eager, we could just replace any stale info - // about this span, but that seems to be no better in practice.) - for (int i = 0; i < npages; i++) { - Static::pageheap()->CacheSizeClass(span->start + i, size_class_); - } // Split the block into pieces and add to the free-list // TODO: coloring of objects to avoid cache conflicts? diff --git a/src/common.h b/src/common.h index 92c582f..b0278eb 100644 --- a/src/common.h +++ b/src/common.h @@ -62,6 +62,7 @@ static const size_t kPageSize = 1 << kPageShift; static const size_t kMaxSize = 8u * kPageSize; static const size_t kAlignment = 8; static const size_t kNumClasses = 61; +static const size_t kLargeSizeClass = 0; // Maximum length we allow a per-thread free-list to have before we // move objects from it into the corresponding central free-list. We diff --git a/src/google/profiler.h b/src/google/profiler.h index 74b936f..a6883f4 100644 --- a/src/google/profiler.h +++ b/src/google/profiler.h @@ -108,13 +108,15 @@ struct ProfilerOptions { void *filter_in_thread_arg; }; -/* Start profiling and write profile info into fname. +/* Start profiling and write profile info into fname, discarding any + * existing profiling data in that file. * * This is equivalent to calling ProfilerStartWithOptions(fname, NULL). */ PERFTOOLS_DLL_DECL int ProfilerStart(const char* fname); -/* Start profiling and write profile into fname. +/* Start profiling and write profile into fname, discarding any + * existing profiling data in that file. * * The profiler is configured using the options given by 'options'. * Options which are not specified are given default values. diff --git a/src/heap-checker.cc b/src/heap-checker.cc index 84e6cf3..2779c97 100644 --- a/src/heap-checker.cc +++ b/src/heap-checker.cc @@ -1377,9 +1377,9 @@ static SpinLock alignment_checker_lock(SpinLock::LINKER_INITIALIZED); if (VLOG_IS_ON(15)) { // log call stacks to help debug how come something is not a leak HeapProfileTable::AllocInfo alloc; - bool r = heap_profile->FindAllocDetails(ptr, &alloc); - r = r; // suppress compiler warning in non-debug mode - RAW_DCHECK(r, ""); // sanity + if (!heap_profile->FindAllocDetails(ptr, &alloc)) { + RAW_LOG(FATAL, "FindAllocDetails failed on ptr %p", ptr); + } RAW_LOG(INFO, "New live %p object's alloc stack:", ptr); for (int i = 0; i < alloc.stack_depth; ++i) { RAW_LOG(INFO, " @ %p", alloc.call_stack[i]); diff --git a/src/internal_logging.h b/src/internal_logging.h index 0cb9ba2..731b2d9 100644 --- a/src/internal_logging.h +++ b/src/internal_logging.h @@ -119,7 +119,9 @@ do { \ #ifndef NDEBUG #define ASSERT(cond) CHECK_CONDITION(cond) #else -#define ASSERT(cond) ((void) 0) +#define ASSERT(cond) \ + do { \ + } while (0 && (cond)) #endif // Print into buffer diff --git a/src/page_heap.cc b/src/page_heap.cc index 1e63cb9..7bfeea4 100644 --- a/src/page_heap.cc +++ b/src/page_heap.cc @@ -61,49 +61,64 @@ PageHeap::PageHeap() } } -Span* PageHeap::New(Length n) { +// Returns the minimum number of pages necessary to ensure that an +// allocation of size n can be aligned to the given alignment. +static Length AlignedAllocationSize(Length n, size_t alignment) { + ASSERT(alignment >= kPageSize); + return n + tcmalloc::pages(alignment - kPageSize); +} + +Span* PageHeap::New(Length n, size_t sc, size_t align) { ASSERT(Check()); ASSERT(n > 0); + if (align < kPageSize) { + align = kPageSize; + } + + Length aligned_size = AlignedAllocationSize(n, align); + // Find first size >= n that has a non-empty list - for (Length s = n; s < kMaxPages; s++) { + for (Length s = aligned_size; s < kMaxPages; s++) { Span* ll = &free_[s].normal; // If we're lucky, ll is non-empty, meaning it has a suitable span. if (!DLL_IsEmpty(ll)) { ASSERT(ll->next->location == Span::ON_NORMAL_FREELIST); - return Carve(ll->next, n); + return Carve(ll->next, n, sc, align); } // Alternatively, maybe there's a usable returned span. ll = &free_[s].returned; if (!DLL_IsEmpty(ll)) { ASSERT(ll->next->location == Span::ON_RETURNED_FREELIST); - return Carve(ll->next, n); + return Carve(ll->next, n, sc, align); } // Still no luck, so keep looking in larger classes. } - Span* result = AllocLarge(n); + Span* result = AllocLarge(n, sc, align); if (result != NULL) return result; // Grow the heap and try again - if (!GrowHeap(n)) { + if (!GrowHeap(aligned_size)) { ASSERT(Check()); return NULL; } - return AllocLarge(n); + return AllocLarge(n, sc, align); } -Span* PageHeap::AllocLarge(Length n) { - // find the best span (closest to n in size). +Span* PageHeap::AllocLarge(Length n, size_t sc, size_t align) { + // Find the best span (closest to n in size). // The following loops implements address-ordered best-fit. Span *best = NULL; + Length aligned_size = AlignedAllocationSize(n, align); + // Search through normal list for (Span* span = large_.normal.next; span != &large_.normal; span = span->next) { - if (span->length >= n) { + if (span->length >= aligned_size) { if ((best == NULL) || (span->length < best->length) || ((span->length == best->length) && (span->start < best->start))) { @@ -117,7 +132,7 @@ Span* PageHeap::AllocLarge(Length n) { for (Span* span = large_.returned.next; span != &large_.returned; span = span->next) { - if (span->length >= n) { + if (span->length >= aligned_size) { if ((best == NULL) || (span->length < best->length) || ((span->length == best->length) && (span->start < best->start))) { @@ -127,19 +142,18 @@ Span* PageHeap::AllocLarge(Length n) { } } - return best == NULL ? NULL : Carve(best, n); + return best == NULL ? NULL : Carve(best, n, sc, align); } Span* PageHeap::Split(Span* span, Length n) { ASSERT(0 < n); ASSERT(n < span->length); - ASSERT(span->location == Span::IN_USE); - ASSERT(span->sizeclass == 0); + ASSERT((span->location != Span::IN_USE) || span->sizeclass == 0); Event(span, 'T', n); const int extra = span->length - n; Span* leftover = NewSpan(span->start + n, extra); - ASSERT(leftover->location == Span::IN_USE); + leftover->location = span->location; Event(leftover, 'U', extra); RecordSpan(leftover); pagemap_.set(span->start + n - 1, span); // Update map from pageid to span @@ -148,25 +162,44 @@ Span* PageHeap::Split(Span* span, Length n) { return leftover; } -Span* PageHeap::Carve(Span* span, Length n) { +Span* PageHeap::Carve(Span* span, Length n, size_t sc, size_t align) { ASSERT(n > 0); ASSERT(span->location != Span::IN_USE); - const int old_location = span->location; + ASSERT(align >= kPageSize); + + Length align_pages = align >> kPageShift; RemoveFromFreeList(span); - span->location = Span::IN_USE; - Event(span, 'A', n); + + if (span->start & (align_pages - 1)) { + Length skip_for_alignment = align_pages - (span->start & (align_pages - 1)); + Span* aligned = Split(span, skip_for_alignment); + PrependToFreeList(span); // Skip coalescing - no candidates possible + span = aligned; + } const int extra = span->length - n; ASSERT(extra >= 0); if (extra > 0) { - Span* leftover = NewSpan(span->start + n, extra); - leftover->location = old_location; - Event(leftover, 'S', extra); - RecordSpan(leftover); - PrependToFreeList(leftover); // Skip coalescing - no candidates possible - span->length = n; - pagemap_.set(span->start + n - 1, span); + Span* leftover = Split(span, n); + PrependToFreeList(leftover); } + + span->location = Span::IN_USE; + span->sizeclass = sc; + Event(span, 'A', n); + + // Cache sizeclass info eagerly. Locking is not necessary. + // (Instead of being eager, we could just replace any stale info + // about this span, but that seems to be no better in practice.) + CacheSizeClass(span->start, sc); + + if (sc != kLargeSizeClass) { + for (Length i = 1; i < n; i++) { + pagemap_.set(span->start + i, span); + CacheSizeClass(span->start + i, sc); + } + } + ASSERT(Check()); return span; } @@ -318,18 +351,6 @@ Length PageHeap::ReleaseAtLeastNPages(Length num_pages) { return released_pages; } -void PageHeap::RegisterSizeClass(Span* span, size_t sc) { - // Associate span object with all interior pages as well - ASSERT(span->location == Span::IN_USE); - ASSERT(GetDescriptor(span->start) == span); - ASSERT(GetDescriptor(span->start+span->length-1) == span); - Event(span, 'C', sc); - span->sizeclass = sc; - for (Length i = 1; i < span->length-1; i++) { - pagemap_.set(span->start+i, span); - } -} - static double MB(uint64_t bytes) { return bytes / 1048576.0; } diff --git a/src/page_heap.h b/src/page_heap.h index 74030d2..de36266 100644 --- a/src/page_heap.h +++ b/src/page_heap.h @@ -93,21 +93,49 @@ class PERFTOOLS_DLL_DECL PageHeap { public: PageHeap(); - // Allocate a run of "n" pages. Returns zero if out of memory. - // Caller should not pass "n == 0" -- instead, n should have - // been rounded up already. - Span* New(Length n); + // Allocate a run of "n" pages. Returns NULL if out of memory. + // Caller should not pass "n == 0" -- instead, n should have been + // rounded up already. The span will be used for allocating objects + // with the specifled sizeclass sc (sc must be zero for large + // objects). The first page of the span will be aligned to the value + // specified by align, which must be a power of two. + Span* New(Length n, size_t sc, size_t align); // Delete the span "[p, p+n-1]". // REQUIRES: span was returned by earlier call to New() and // has not yet been deleted. void Delete(Span* span); - // Mark an allocated span as being used for small objects of the - // specified size-class. - // REQUIRES: span was returned by an earlier call to New() - // and has not yet been deleted. - void RegisterSizeClass(Span* span, size_t sc); + // Gets either the size class of addr, if it is a small object, or it's span. + // Return: + // if addr is invalid: + // leave *out_sc and *out_span unchanged and return false; + // if addr is valid and has a small size class: + // *out_sc = the size class + // *out_span = + // return true + // if addr is valid and has a large size class: + // *out_sc = kLargeSizeClass + // *out_span = the span pointer + // return true + bool GetSizeClassOrSpan(void* addr, size_t* out_sc, Span** out_span) { + const PageID p = reinterpret_cast(addr) >> kPageShift; + size_t cl = GetSizeClassIfCached(p); + Span* span = NULL; + + if (cl != kLargeSizeClass) { + ASSERT(cl == GetDescriptor(p)->sizeclass); + } else { + span = GetDescriptor(p); + if (!span) { + return false; + } + cl = span->sizeclass; + } + *out_span = span; + *out_sc = cl; + return true; + } // Split an allocated span into two spans: one of length "n" pages // followed by another span of length "span->length - n" pages. @@ -115,14 +143,29 @@ class PERFTOOLS_DLL_DECL PageHeap { // Returns a pointer to the second span. // // REQUIRES: "0 < n < span->length" - // REQUIRES: span->location == IN_USE - // REQUIRES: span->sizeclass == 0 + // REQUIRES: a) the span is free or b) sizeclass == 0 Span* Split(Span* span, Length n); // Return the descriptor for the specified page. Returns NULL if // this PageID was not allocated previously. inline Span* GetDescriptor(PageID p) const { - return reinterpret_cast(pagemap_.get(p)); + Span* ret = reinterpret_cast(pagemap_.get(p)); +#ifndef NDEBUG + if (ret != NULL && ret->location == Span::IN_USE) { + size_t cl = GetSizeClassIfCached(p); + // Three cases: + // - The object is not cached + // - The object is cached correctly + // - It is a large object and we're not looking at the first + // page. This happens in coalescing. + ASSERT(cl == kLargeSizeClass || cl == ret->sizeclass || + (ret->start != p && ret->sizeclass == kLargeSizeClass)); + // If the object is sampled, it must have be kLargeSizeClass + ASSERT(ret->sizeclass == kLargeSizeClass || !ret->sample); + } +#endif + + return ret; } // Dump state to stderr @@ -223,7 +266,7 @@ class PERFTOOLS_DLL_DECL PageHeap { // length exactly "n" and mark it as non-free so it can be returned // to the client. After all that, decrease free_pages_ by n and // return span. - Span* Carve(Span* span, Length n); + Span* Carve(Span* span, Length n, size_t sc, size_t align); void RecordSpan(Span* span) { pagemap_.set(span->start, span); @@ -234,7 +277,7 @@ class PERFTOOLS_DLL_DECL PageHeap { // Allocate a large span of length == n. If successful, returns a // span of exactly the specified length. Else, returns NULL. - Span* AllocLarge(Length n); + Span* AllocLarge(Length n, size_t sc, size_t align); // Coalesce span with neighboring spans if possible, prepend to // appropriate free list, and adjust stats. diff --git a/src/pprof b/src/pprof index d70ee30..8aff380 100755 --- a/src/pprof +++ b/src/pprof @@ -106,6 +106,12 @@ my $FILTEREDPROFILE_PAGE = "/pprof/filteredprofile(?:\\?.*)?"; my $SYMBOL_PAGE = "/pprof/symbol"; # must support symbol lookup via POST my $PROGRAM_NAME_PAGE = "/pprof/cmdline"; +# These are the web pages that can be named on the command line. +# All the alternatives must begin with /. +my $PROFILES = "($HEAP_PAGE|$PROFILE_PAGE|$PMUPROFILE_PAGE|" . + "$GROWTH_PAGE|$CONTENTION_PAGE|$WALL_PAGE|" . + "$FILTEREDPROFILE_PAGE)"; + # default binary name my $UNKNOWN_BINARY = "(unknown)"; @@ -718,10 +724,8 @@ sub RunWeb { "firefox", ); foreach my $b (@alt) { - if (-f $b) { - if (system($b, $fname) == 0) { - return; - } + if (system($b, $fname) == 0) { + return; } } @@ -2704,32 +2708,44 @@ sub CheckSymbolPage { sub IsProfileURL { my $profile_name = shift; - my ($host, $port, $prefix, $path) = ParseProfileURL($profile_name); - return defined($host) and defined($port) and defined($path); + if (-f $profile_name) { + printf STDERR "Using local file $profile_name.\n"; + return 0; + } + return 1; } sub ParseProfileURL { my $profile_name = shift; - if (defined($profile_name) && - $profile_name =~ m,^(http://|)([^/:]+):(\d+)(|\@\d+)(|/|(.*?)($PROFILE_PAGE|$PMUPROFILE_PAGE|$HEAP_PAGE|$GROWTH_PAGE|$CONTENTION_PAGE|$WALL_PAGE|$FILTEREDPROFILE_PAGE))$,o) { - # $7 is $PROFILE_PAGE/$HEAP_PAGE/etc. $5 is *everything* after - # the hostname, as long as that everything is the empty string, - # a slash, or something ending in $PROFILE_PAGE/$HEAP_PAGE/etc. - # So "$7 || $5" is $PROFILE_PAGE/etc if there, or else it's "/" or "". - return ($2, $3, $6, $7 || $5); + + if (!defined($profile_name) || $profile_name eq "") { + return (); } - return (); + + # Split profile URL - matches all non-empty strings, so no test. + $profile_name =~ m,^(https?://)?([^/]+)(.*?)(/|$PROFILES)?$,; + + my $proto = $1 || "http://"; + my $hostport = $2; + my $prefix = $3; + my $profile = $4 || "/"; + + my $host = $hostport; + $host =~ s/:.*//; + + my $baseurl = "$proto$hostport$prefix"; + return ($host, $baseurl, $profile); } # We fetch symbols from the first profile argument. sub SymbolPageURL { - my ($host, $port, $prefix, $path) = ParseProfileURL($main::pfile_args[0]); - return "http://$host:$port$prefix$SYMBOL_PAGE"; + my ($host, $baseURL, $path) = ParseProfileURL($main::pfile_args[0]); + return "$baseURL$SYMBOL_PAGE"; } sub FetchProgramName() { - my ($host, $port, $prefix, $path) = ParseProfileURL($main::pfile_args[0]); - my $url = "http://$host:$port$prefix$PROGRAM_NAME_PAGE"; + my ($host, $baseURL, $path) = ParseProfileURL($main::pfile_args[0]); + my $url = "$baseURL$PROGRAM_NAME_PAGE"; my $command_line = "$URL_FETCHER '$url'"; open(CMDLINE, "$command_line |") or error($command_line); my $cmdline = ; @@ -2880,10 +2896,10 @@ sub BaseName { sub MakeProfileBaseName { my ($binary_name, $profile_name) = @_; - my ($host, $port, $prefix, $path) = ParseProfileURL($profile_name); + my ($host, $baseURL, $path) = ParseProfileURL($profile_name); my $binary_shortname = BaseName($binary_name); - return sprintf("%s.%s.%s-port%s", - $binary_shortname, $main::op_time, $host, $port); + return sprintf("%s.%s.%s", + $binary_shortname, $main::op_time, $host); } sub FetchDynamicProfile { @@ -2895,7 +2911,7 @@ sub FetchDynamicProfile { if (!IsProfileURL($profile_name)) { return $profile_name; } else { - my ($host, $port, $prefix, $path) = ParseProfileURL($profile_name); + my ($host, $baseURL, $path) = ParseProfileURL($profile_name); if ($path eq "" || $path eq "/") { # Missing type specifier defaults to cpu-profile $path = $PROFILE_PAGE; @@ -2903,33 +2919,26 @@ sub FetchDynamicProfile { my $profile_file = MakeProfileBaseName($binary_name, $profile_name); - my $url; + my $url = "$baseURL$path"; my $fetch_timeout = undef; - if (($path =~ m/$PROFILE_PAGE/) || ($path =~ m/$PMUPROFILE_PAGE/)) { - if ($path =~ m/$PROFILE_PAGE/) { - $url = sprintf("http://$host:$port$prefix$path?seconds=%d", - $main::opt_seconds); + if ($path =~ m/$PROFILE_PAGE|$PMUPROFILE_PAGE/) { + if ($path =~ m/[?]/) { + $url .= "&"; } else { - if ($profile_name =~ m/[?]/) { - $profile_name .= "&" - } else { - $profile_name .= "?" - } - $url = sprintf("http://$profile_name" . "seconds=%d", - $main::opt_seconds); + $url .= "?"; } + $url .= sprintf("seconds=%d", $main::opt_seconds); $fetch_timeout = $main::opt_seconds * 1.01 + 60; } else { # For non-CPU profiles, we add a type-extension to # the target profile file name. my $suffix = $path; $suffix =~ s,/,.,g; - $profile_file .= "$suffix"; - $url = "http://$host:$port$prefix$path"; + $profile_file .= $suffix; } my $profile_dir = $ENV{"PPROF_TMPDIR"} || ($ENV{HOME} . "/pprof"); - if (!(-d $profile_dir)) { + if (! -d $profile_dir) { mkdir($profile_dir) || die("Unable to create profile directory $profile_dir: $!\n"); } @@ -2942,13 +2951,13 @@ sub FetchDynamicProfile { my $fetcher = AddFetchTimeout($URL_FETCHER, $fetch_timeout); my $cmd = "$fetcher '$url' > '$tmp_profile'"; - if (($path =~ m/$PROFILE_PAGE/) || ($path =~ m/$PMUPROFILE_PAGE/)){ + if ($path =~ m/$PROFILE_PAGE|$PMUPROFILE_PAGE/){ print STDERR "Gathering CPU profile from $url for $main::opt_seconds seconds to\n ${real_profile}\n"; if ($encourage_patience) { print STDERR "Be patient...\n"; } } else { - print STDERR "Fetching $path profile from $host:$port to\n ${real_profile}\n"; + print STDERR "Fetching $path profile from $url to\n ${real_profile}\n"; } (system($cmd) == 0) || error("Failed to get profile: $cmd: $!\n"); diff --git a/src/span.h b/src/span.h index ab9a796..b3483ca 100644 --- a/src/span.h +++ b/src/span.h @@ -60,6 +60,10 @@ struct Span { int value[64]; #endif + void* start_ptr() { + return reinterpret_cast(start << kPageShift); + } + // What freelist the span is on: IN_USE if on none, or normal or returned enum { IN_USE, ON_NORMAL_FREELIST, ON_RETURNED_FREELIST }; }; diff --git a/src/stacktrace_win32-inl.h b/src/stacktrace_win32-inl.h index 892cd7c..bbd4c43 100644 --- a/src/stacktrace_win32-inl.h +++ b/src/stacktrace_win32-inl.h @@ -49,6 +49,11 @@ // This code is inspired by a patch from David Vitek: // http://code.google.com/p/google-perftools/issues/detail?id=83 +#ifndef BASE_STACKTRACE_WIN32_INL_H_ +#define BASE_STACKTRACE_WIN32_INL_H_ +// Note: this file is included into stacktrace.cc more than once. +// Anything that should only be defined once should be here: + #include "config.h" #include // for GetProcAddress and GetModuleHandle #include @@ -82,3 +87,5 @@ PERFTOOLS_DLL_DECL int GetStackFrames(void** /* pcs */, assert(0 == "Not yet implemented"); return 0; } + +#endif // BASE_STACKTRACE_WIN32_INL_H_ diff --git a/src/tcmalloc.cc b/src/tcmalloc.cc index 122e18f..011fc91 100644 --- a/src/tcmalloc.cc +++ b/src/tcmalloc.cc @@ -798,22 +798,25 @@ static TCMallocGuard module_enter_exit_hook; // Helpers for the exported routines below //------------------------------------------------------------------- -static inline bool CheckCachedSizeClass(void *ptr) { - PageID p = reinterpret_cast(ptr) >> kPageShift; - size_t cached_value = Static::pageheap()->GetSizeClassIfCached(p); - return cached_value == 0 || - cached_value == Static::pageheap()->GetDescriptor(p)->sizeclass; -} - static inline void* CheckedMallocResult(void *result) { - ASSERT(result == NULL || CheckCachedSizeClass(result)); + Span* fetched_span; + size_t cl; + + if (result != NULL) { + ASSERT(Static::pageheap()->GetSizeClassOrSpan(result, &cl, &fetched_span)); + } + return result; } static inline void* SpanToMallocResult(Span *span) { - Static::pageheap()->CacheSizeClass(span->start, 0); - return - CheckedMallocResult(reinterpret_cast(span->start << kPageShift)); + Span* fetched_span = NULL; + size_t cl = 0; + ASSERT(Static::pageheap()->GetSizeClassOrSpan(span->start_ptr(), + &cl, &fetched_span)); + ASSERT(cl == kLargeSizeClass); + ASSERT(span == fetched_span); + return span->start_ptr(); } static void* DoSampledAllocation(size_t size) { @@ -824,7 +827,8 @@ static void* DoSampledAllocation(size_t size) { SpinLockHolder h(Static::pageheap_lock()); // Allocate span - Span *span = Static::pageheap()->New(tcmalloc::pages(size == 0 ? 1 : size)); + Span *span = Static::pageheap()->New(tcmalloc::pages(size == 0 ? 1 : size), + kLargeSizeClass, kPageSize); if (span == NULL) { return NULL; } @@ -915,7 +919,7 @@ inline void* do_malloc_pages(ThreadCache* heap, size_t size) { report_large = should_report_large(num_pages); } else { SpinLockHolder h(Static::pageheap_lock()); - Span* span = Static::pageheap()->New(num_pages); + Span* span = Static::pageheap()->New(num_pages, kLargeSizeClass, kPageSize); result = (span == NULL ? NULL : SpanToMallocResult(span)); report_large = should_report_large(num_pages); } @@ -971,28 +975,22 @@ static inline ThreadCache* GetCacheIfPresent() { inline void do_free_with_callback(void* ptr, void (*invalid_free_fn)(void*)) { if (ptr == NULL) return; ASSERT(Static::pageheap() != NULL); // Should not call free() before malloc() - const PageID p = reinterpret_cast(ptr) >> kPageShift; - Span* span = NULL; - size_t cl = Static::pageheap()->GetSizeClassIfCached(p); + Span* span; + size_t cl; - if (cl == 0) { - span = Static::pageheap()->GetDescriptor(p); - if (!span) { - // span can be NULL because the pointer passed in is invalid - // (not something returned by malloc or friends), or because the - // pointer was allocated with some other allocator besides - // tcmalloc. The latter can happen if tcmalloc is linked in via - // a dynamic library, but is not listed last on the link line. - // In that case, libraries after it on the link line will - // allocate with libc malloc, but free with tcmalloc's free. - (*invalid_free_fn)(ptr); // Decide how to handle the bad free request - return; - } - cl = span->sizeclass; - Static::pageheap()->CacheSizeClass(p, cl); + if (!Static::pageheap()->GetSizeClassOrSpan(ptr, &cl, &span)) { + // result can be false because the pointer passed in is invalid + // (not something returned by malloc or friends), or because the + // pointer was allocated with some other allocator besides + // tcmalloc. The latter can happen if tcmalloc is linked in via + // a dynamic library, but is not listed last on the link line. + // In that case, libraries after it on the link line will + // allocate with libc malloc, but free with tcmalloc's free. + (*invalid_free_fn)(ptr); // Decide how to handle the bad free request + return; } - if (cl != 0) { - ASSERT(!Static::pageheap()->GetDescriptor(p)->sample); + + if (cl != kLargeSizeClass) { ThreadCache* heap = GetCacheIfPresent(); if (heap != NULL) { heap->Deallocate(ptr, cl); @@ -1003,8 +1001,7 @@ inline void do_free_with_callback(void* ptr, void (*invalid_free_fn)(void*)) { } } else { SpinLockHolder h(Static::pageheap_lock()); - ASSERT(reinterpret_cast(ptr) % kPageSize == 0); - ASSERT(span != NULL && span->start == p); + ASSERT(span != NULL && ptr == span->start_ptr()); if (span->sample) { tcmalloc::DLL_Remove(span); Static::stacktrace_allocator()->Delete( @@ -1024,20 +1021,17 @@ inline size_t GetSizeWithCallback(void* ptr, size_t (*invalid_getsize_fn)(void*)) { if (ptr == NULL) return 0; - const PageID p = reinterpret_cast(ptr) >> kPageShift; - size_t cl = Static::pageheap()->GetSizeClassIfCached(p); - if (cl != 0) { + + Span* span; + size_t cl; + if (!Static::pageheap()->GetSizeClassOrSpan(ptr, &cl, &span)) { + return (*invalid_getsize_fn)(ptr); + } + + if (cl != kLargeSizeClass) { return Static::sizemap()->ByteSizeForClass(cl); } else { - Span *span = Static::pageheap()->GetDescriptor(p); - if (span == NULL) { // means we do not own this memory - return (*invalid_getsize_fn)(ptr); - } else if (span->sizeclass != 0) { - Static::pageheap()->CacheSizeClass(p, span->sizeclass); - return Static::sizemap()->ByteSizeForClass(span->sizeclass); - } else { - return span->length << kPageShift; - } + return span->length << kPageShift; } } @@ -1132,39 +1126,10 @@ void* do_memalign(size_t align, size_t size) { // We will allocate directly from the page heap SpinLockHolder h(Static::pageheap_lock()); - if (align <= kPageSize) { - // Any page-level allocation will be fine - // TODO: We could put the rest of this page in the appropriate - // TODO: cache but it does not seem worth it. - Span* span = Static::pageheap()->New(tcmalloc::pages(size)); - return span == NULL ? NULL : SpanToMallocResult(span); - } - - // Allocate extra pages and carve off an aligned portion - const Length alloc = tcmalloc::pages(size + align); - Span* span = Static::pageheap()->New(alloc); - if (span == NULL) return NULL; - - // Skip starting portion so that we end up aligned - Length skip = 0; - while ((((span->start+skip) << kPageShift) & (align - 1)) != 0) { - skip++; - } - ASSERT(skip < alloc); - if (skip > 0) { - Span* rest = Static::pageheap()->Split(span, skip); - Static::pageheap()->Delete(span); - span = rest; - } - - // Skip trailing portion that we do not need to return - const Length needed = tcmalloc::pages(size); - ASSERT(span->length >= needed); - if (span->length > needed) { - Span* trailer = Static::pageheap()->Split(span, needed); - Static::pageheap()->Delete(trailer); - } - return SpanToMallocResult(span); + // Any page-level allocation will be fine + Span* span = Static::pageheap()->New(tcmalloc::pages(size), + kLargeSizeClass, align); + return span == NULL ? NULL : SpanToMallocResult(span); } // Helpers for use by exported routines below: diff --git a/src/tests/page_heap_test.cc b/src/tests/page_heap_test.cc index 9120b78..fd444da 100644 --- a/src/tests/page_heap_test.cc +++ b/src/tests/page_heap_test.cc @@ -26,7 +26,7 @@ static void TestPageHeap_Stats() { CheckStats(ph, 0, 0, 0); // Allocate a span 's1' - tcmalloc::Span* s1 = ph->New(256); + tcmalloc::Span* s1 = ph->New(256, kLargeSizeClass, kPageSize); CheckStats(ph, 256, 0, 0); // Split span 's1' into 's1', 's2'. Delete 's2' diff --git a/src/windows/addr2line-pdb.c b/src/windows/addr2line-pdb.c index 97b614b..5c65a03 100644 --- a/src/windows/addr2line-pdb.c +++ b/src/windows/addr2line-pdb.c @@ -48,6 +48,12 @@ #define SEARCH_CAP (1024*1024) #define WEBSYM "SRV*c:\\websymbols*http://msdl.microsoft.com/download/symbols" +void usage() { + fprintf(stderr, "usage: " + "addr2line-pdb [-f|--functions] [-C|--demangle] [-e filename]\n"); + fprintf(stderr, "(Then list the hex addresses on stdin, one per line)\n"); +} + int main(int argc, char *argv[]) { DWORD error; HANDLE process; @@ -74,10 +80,11 @@ int main(int argc, char *argv[]) { } filename = argv[i+1]; i++; /* to skip over filename too */ + } else if (strcmp(argv[i], "--help") == 0) { + usage(); + exit(0); } else { - fprintf(stderr, "usage: " - "addr2line-pdb [-f|--functions] [-C|--demangle] [-e filename]\n"); - fprintf(stderr, "(Then list the hex addresses on stdin, one per line)\n"); + usage(); exit(1); } } diff --git a/src/windows/nm-pdb.c b/src/windows/nm-pdb.c index 726d345..9beb21d 100644 --- a/src/windows/nm-pdb.c +++ b/src/windows/nm-pdb.c @@ -180,6 +180,10 @@ static void ShowSymbolInfo(HANDLE process, ULONG64 module_base) { #endif } +void usage() { + fprintf(stderr, "usage: nm-pdb [-C|--demangle] \n"); +} + int main(int argc, char *argv[]) { DWORD error; HANDLE process; @@ -195,12 +199,15 @@ int main(int argc, char *argv[]) { for (i = 1; i < argc; i++) { if (strcmp(argv[i], "--demangle") == 0 || strcmp(argv[i], "-C") == 0) { symopts |= SYMOPT_UNDNAME; + } else if (strcmp(argv[i], "--help") == 0) { + usage(); + exit(0); } else { break; } } if (i != argc - 1) { - fprintf(stderr, "usage: nm-pdb [-C|--demangle] \n"); + usage(); exit(1); } filename = argv[i]; diff --git a/src/windows/port.cc b/src/windows/port.cc index bf3b106..9a9da80 100644 --- a/src/windows/port.cc +++ b/src/windows/port.cc @@ -100,10 +100,14 @@ bool CheckIfKernelSupportsTLS() { // binary (it also doesn't run if the thread is terminated via // TerminateThread, which if we're lucky this routine does). -// This makes the linker create the TLS directory if it's not already -// there (that is, even if __declspec(thead) is not used). +// Force a reference to _tls_used to make the linker create the TLS directory +// if it's not already there (that is, even if __declspec(thread) is not used). +// Force a reference to p_thread_callback_tcmalloc and p_process_term_tcmalloc +// to prevent whole program optimization from discarding the variables. #ifdef _MSC_VER #pragma comment(linker, "/INCLUDE:__tls_used") +#pragma comment(linker, "/INCLUDE:_p_thread_callback_tcmalloc") +#pragma comment(linker, "/INCLUDE:_p_process_term_tcmalloc") #endif // When destr_fn eventually runs, it's supposed to take as its @@ -142,14 +146,18 @@ static void NTAPI on_tls_callback(HINSTANCE h, DWORD dwReason, PVOID pv) { #ifdef _MSC_VER +// extern "C" suppresses C++ name mangling so we know the symbol names +// for the linker /INCLUDE:symbol pragmas above. +extern "C" { // This tells the linker to run these functions. #pragma data_seg(push, old_seg) #pragma data_seg(".CRT$XLB") -static void (NTAPI *p_thread_callback)(HINSTANCE h, DWORD dwReason, PVOID pv) - = on_tls_callback; +void (NTAPI *p_thread_callback_tcmalloc)( + HINSTANCE h, DWORD dwReason, PVOID pv) = on_tls_callback; #pragma data_seg(".CRT$XTU") -static int (*p_process_term)(void) = on_process_term; +int (*p_process_term_tcmalloc)(void) = on_process_term; #pragma data_seg(pop, old_seg) +} // extern "C" #else // #ifdef _MSC_VER [probably msys/mingw]