Commit Graph

502 Commits

Author SHA1 Message Date
Milton Chiang
81d8d2a9e7 Add "ARMv8-A" to the supporting list of ARM architecture. 2015-05-23 12:01:48 -07:00
Aliaksey Kandratsenka
64d1a86cb8 include time.h for struct timespec on Visual Studio 2015
This patch was submitted by user wmamrak.
2015-05-09 15:38:12 -07:00
Aliaksey Kandratsenka
7013b21997 hook mi_force_{un,}lock on OSX instead of pthread_atfork
This is patch by Anton Samokhvalov.

Apparently it helps with locking around forking on OSX.
2015-05-09 14:56:58 -07:00
Angus Gratton
f25f8e0bf2 Clarify that only tcmalloc_minimal is supported on Windows. 2015-05-09 12:03:17 -07:00
Aliaksey Kandratsenka
772a686c45 issue-683: fix compile error in clang with -m32 and 64-bit off_t 2015-05-03 13:15:16 -07:00
Aliaksey Kandratsenka
0a3bafd645 fix typo in PrintAvailability code
This is patch contributed by user ssubotin.
2015-04-11 10:35:53 -07:00
Matt Cross
6ce10a2a05 Add support for printing collapsed stacks for generating flame graphs. 2015-03-26 16:24:11 -04:00
Matt Cross
2c1a165fa5 Add support for reading debug symbols automatically on systems where shared libraries with debug symbols are installed at "/usr/lib/debug/<originalpath>.debug", such as RHEL and CentOS. 2015-03-26 12:14:56 -04:00
Jonathan Lambrechts
2e65495628 callgrind : handle inlined functions 2015-02-13 17:54:21 -08:00
Jonathan Lambrechts
90d7408d38 pprof : callgrind : fix unknown files 2015-02-13 17:54:14 -08:00
Aliaksey Kandratsenka
aa963a24ae issue-672: fixed date of news entry of gperftools 2.4 release
It is 2015 and not 2014. Spotted and reported by Armin Rigo.
2015-02-09 08:35:03 -08:00
Aliaksey Kandratsenka
c66aeabdba fixed default value of HEAP_PROFILER_TIME_INTERVAL in .html doc 2015-01-10 14:35:54 -08:00
Aliaksey Kandratsenka
689e4a5bb4 bumped version to 2.4 2015-01-10 12:26:51 -08:00
Aliaksey Kandratsenka
3f5f1bba0c bumped version to 2.4rc 2014-12-28 18:28:18 -08:00
Aliaksey Kandratsenka
c4dfdebc79 updated NEWS for gperftools 2.4rc 2014-12-28 18:28:15 -08:00
Aliaksey Kandratsenka
0096be5f6f pprof: allow disabling auto-removal of "constant 2nd frame"
"constand 2nd frame" feature is supposed to detect and workaround
incorrect cpu profile stack captures where parts of or whole cpu
profiling signal handler frames are not skipped.

I've seen programs where this feature incorrectly removes non-signal
frames.

Plus it actually hides bugs in stacktrace capturing which we want be
able to spot.

There is now --no-auto-signal-frm option for disabling it.
2014-12-28 15:35:54 -08:00
Aliaksey Kandratsenka
4859d80205 cpuprofiler: drop correct number of signal handler frames
We actually have 3 and not 2 of them.
2014-12-28 15:35:54 -08:00
Aliaksey Kandratsenka
812ab1ee7e pprof: eliminate duplicate top frames if dropping signal frames
In cpu profiles that had parts of signal handler we could have
situation like that:

* PC
* signal handler frame
* PC

Specifically when capturing stacktraces via libunwind.

For such stacktraces pprof used to draw self-cycle in functions
confusing everybody. Given that me might have a number of such
profiles in the wild it makes sense to treat that duplicate PC issue.
2014-12-28 15:35:54 -08:00
Aliaksey Kandratsenka
e6e78315e4 cpuprofiler: better explain deduplication of top stacktrace entry 2014-12-28 15:35:54 -08:00
Aliaksey Kandratsenka
24b8ec2846 cpuprofiler: disable capturing stacktrace from signal's ucontext
This was reported to cause problems due to libunwind occasionally
returning top level pc that is 1 smaller than real pc which causes
problems.
2014-12-28 15:35:54 -08:00
Aliaksey Kandratsenka
83588de720 pprof: added support for dumping stacks in --text mode
Which is very useful for diagnosing stack capturing and processing
bugs.
2014-12-28 15:35:54 -08:00
Aliaksey Kandratsenka
2f29c9b062 pprof: made --show-addresses work 2014-12-28 15:35:54 -08:00
Raphael Moreira Zinsly
b8b027d09a Make PPC64 use 64K of internal page size for tcmalloc by default
This patch set the default tcmalloc internal page size to 64K when
built on PPC.
2014-12-23 10:51:54 -08:00
Raphael Moreira Zinsly
3f55d874be New configure flags to set the alignment and page size of tcmalloc
Added two new configure flags, --with-tcmalloc-pagesize and
--with-tcmalloc-alignment, in order to set the tcmalloc internal page
size and tcmalloc allocation alignment without the need of a compiler
directive and to make the choice of the page size independent of the
allocation alignment.
2014-12-23 10:51:51 -08:00
Aliaksey Kandratsenka
1035d5c18f start building malloc_extension_c_test even with static linking
Comment in Makefile.am stating that it doesn't work with static
linking is not accurate anymore.
2014-12-21 19:52:34 -08:00
Aliaksey Kandratsenka
d570a6391c unbreak malloc_extension_c_test on clang
Looks like even force_malloc trick was not enough to force clang to
actually call malloc. I'm now calling tc_malloc directly to prevent
that smartness.
2014-12-21 19:33:25 -08:00
Aliaksey Kandratsenka
4ace8dbbe2 added subdir-objects automake options
This is suggested by automake itself regarding future-compat.
2014-12-21 18:49:47 -08:00
Aliaksey Kandratsenka
f72e37c3f9 fixed C++ comment warning in malloc_extension_c.h from C compiler 2014-12-21 18:27:03 -08:00
Aliaksey Kandratsenka
f94ff0cc09 made AtomicOps_x86CPUFeatureStruct hidden
So that access to has_sse2 is faster under -fPIC.
2014-12-20 21:20:43 -08:00
Aliaksey Kandratsenka
987a724c23 dropped atopmicops workaround for irrelevant Opteron locking bug
It's not cheap at all when done in this way (i.e. without runtime
patching) and apparently useless.

It looks like Linux kernel never got this workaround at all. See
bugzilla ticket: https://bugzilla.kernel.org/show_bug.cgi?id=11305

And I see no traces of this workaround in glibc either.

On the other hand, opensolaris folks apparently still have it (or
something similar, based on comments on linux bugzilla) in their code:
32842aabdc/usr/src/uts/i86pc/os/mp_startup.c (L1136)

And affected CPUs (if any) are from year 2008 (that's 6 years now).

Plus even if somebody still uses those cpus (which is unlikely), they
won't have working kernel and glibc anyways.
2014-12-20 21:20:43 -08:00
Aliaksey Kandratsenka
7da5bd014d enabled aggressive decommit by default
TCMALLOC_AGGRESSIVE_DECOMMIT=f is one way to disable it and
SetNumericProperty is another.
2014-12-20 21:18:07 -08:00
Aliaksey Kandratsenka
51b0ad55b3 added basic unit test for singular malloc hooks 2014-12-07 17:46:04 -08:00
Aliaksey Kandratsenka
bce72dda07 inform compiler that tcmalloc allocation sampling is unlikely
Now compiler generates slightly better code which produces jump-less
code for common case of not sampling allocations.
2014-12-07 17:46:04 -08:00
Aliaksey Kandratsenka
4f051fddcd eliminated CheckIfKernelSupportsTLS
We don't care about pre-2.6.0 kernels anymore. So we can assume that
if compile time check worked, then at runtime it'll work.
2014-12-07 17:46:04 -08:00
Aliaksey Kandratsenka
81291ac399 set elf visibility to hidden for malloc hooks
To speed up access to them under -fPIC.
2014-12-07 17:46:04 -08:00
Aliaksey Kandratsenka
105c004d0c introduced ATTRIBUTE_VISIBILITY_HIDDEN
So that we can disable elf symbol interposition for certain
perf-sensitive symbols.
2014-12-07 17:46:04 -08:00
Aliaksey Kandratsenka
6a6c49e1f5 replaced separate singular malloc hooks with faster HookList
Specifically, we can now check in one place if hooks are set at all,
instead of two places. Which makes fast path shorter.
2014-12-07 17:46:04 -08:00
Aliaksey Kandratsenka
ba0441785b removed extra barriers in malloc hooks mutation methods
Because those are already done under spinlock and read-only and
lockless Traverse is already tolerant to slight inconsistencies.
2014-12-07 17:46:04 -08:00
Aliaksey Kandratsenka
890f34c77e introduced support for deprecated singular hooks into HookList
So that we can later drop separate singular hooks.
2014-12-07 17:46:04 -08:00
Aliaksey Kandratsenka
81ed7dff11 returned date of 2.3rc in NEWS back 2014-12-07 13:33:40 -08:00
Aliaksey Kandratsenka
463a619408 bumped version to 2.3 2014-12-07 12:53:35 -08:00
Aliaksey Kandratsenka
76e8138e12 updated NEWS for gperftools 2.3 2014-12-07 12:46:49 -08:00
Raphael Moreira Zinsly
8eb4ed785a Added option to disable libunwind linking
This patch adds a configure option to enable or disable libunwind linking.
The patch also disables libunwind on ppc by default.
2014-11-27 12:51:33 -08:00
Aliaksey Kandratsenka
3b94031d21 compile libunwind unwinder only of __thread is supported
This fixed build on certain OSX that I have access to.
2014-11-27 12:30:36 -08:00
Aliaksey Kandratsenka
3ace468202 issue-658: correctly close socketpair fds when socketpair fails
This applies patch by glider.
2014-11-27 10:45:53 -08:00
Aliaksey Kandratsenka
e7d5e512b0 bumped version to 2.3rc 2014-11-02 20:13:33 -08:00
Aliaksey Kandratsenka
1d44d37851 updated NEWS for gperftools 2.3rc 2014-11-02 19:59:05 -08:00
Aliaksey Kandratsenka
1108d83cf4 implemented cpu-profiling mode that profiles threads separately
Default mode of operation of cpu profiler uses itimer and
SIGPROF. This timer is by definition per-process and no spec defines
which thread is going to receive SIGPROF. And it provides correct
profiles only if we assume that probability of picking threads will be
proportional to cpu time spent by threads.

It is easy to see, that recent Linux (at least on common SMP hardware)
doesn't satisfy that assumption. Quite big skews of SIGPROF ticks
between threads is visible. I.e. I could see as big as 70%/20%
division instead of 50%/50% for pair of cpu-hog threads. (And I do see
it become 50/50 with new mode)

Fortunately POSIX provides mechanism to track per-thread cpu time via
posix timers facility. And even more fortunately, Linux also provides
mechanism to deliver timer ticks to specific threads.

Interestingly, it looks like FreeBSD also has very similar facility
and seems to suffer from same skew.  But due to difference in a way
how threads are identified, I haven't bothered to try to support this
mode on FreeBSD.

This commit implements new profiling mode where every thread creates
posix timer which tracks thread's cpu time. Threads also also set up
signal delivery to itself on overflows of that timer.

This new mode requires every thread to be registered in cpu
profiler. Existing ProfilerRegisterThread function is used for that.

Because registering threads requires application support (or suitable
LD_PRELOAD-able wrapper for thread creation API), new mode is off by
default. And it has to be manually activated by setting environment
variable CPUPROFILE_PER_THREAD_TIMERS.

New mode also requires librt symbols to be available. Which we do not
link to due to librt's dependency on libpthread.  Which we avoid due
to perf impact of bringing in libpthread to otherwise single-threaded
programs. So it has to be either already loaded by profiling program
or LD_PRELOAD-ed.
2014-11-02 18:29:55 -08:00
Aliaksey Kandratsenka
714bd93e42 drop workaround for too old redhat 7
Note that this is _not_ RHEL7 but original redhat 7 from early 2000s.
2014-11-02 18:29:55 -08:00
Aliaksey Kandratsenka
8de46e66fc don't add leaf function twice to profile under libunwind 2014-11-02 18:29:55 -08:00