Commit Graph

129 Commits

Author SHA1 Message Date
Aliaksey Kandratsenka
85048430ac consolidate do_mallinfo{,2}
We had 2 nearly identical implementations. Thankfully C++ templates
facility lets us produce 2 different runtime functions (for different
type widths) without duplicating source.

Amend github issue #1414
2023-12-07 15:01:27 -05:00
Mateusz Jakub Fila
b8e75ae6fe Add mallinfo2 function 2023-12-07 14:10:51 +01:00
Aliaksey Kandratsenka
57512e9c3d unbreak -Wthread-safety
It actually found real (but arguably minor) issue with memory region
map locking.

As part of that we're replacing PageHeap::DeleteAndUnlock that had
somewhat ambitious 'move' of SpinLockHolder, with more straightforward
PageHeap::PrepareAndDelete. Doesn't look like we can support move
thingy with thread annotations.
2023-08-06 19:32:32 -04:00
Aliaksey Kandratsenka
8b3f0d6145 undo MarkThreadTemporarilyIdle and make it same as MarkThreadIdle
As noted on github issue #880 'temporarily' thing saves us not just on
freeing thread cache, but also returning thread's share of thread
cache (max_size_) into common pool. And the later has caused trouble
to mongo folk who originally proposed 'temporarily' thing. They claim
they don't use it anymore.

And thus with no users and no clear benefit, it makes no sense for us
to keep this API. For API and ABI compat sake we keep it, but it is
now identical to regular MarkThreadIdle.

Fixes issue #880
2023-07-31 14:11:51 -04:00
Aliaksey Kandratsenka
d2c89ba534 don't return raw span when sampling and stacktrace oomed
This is nearly impossible in practice, but still. Somehow we missed
this logic that DoSampledAllocation always returns actual object, but
in that condition where stacktrace_allocator failed to get us
StackTrace object we ended up returning span instead of it's object.
2023-07-24 21:01:35 -04:00
Aliaksey Kandratsenka
0d42a48699 move page heap locking under PageHeap
While there is still plenty of code that takes pageheap_lock outside
of page_heap module for all kinds of reasons, at least
bread-and-butter logic of allocating/deallocating larger chunks of
memory is now handling page heap locking inside PageHeap itself. This
gives us flexibility.

Update issue #1159
2023-07-24 21:01:35 -04:00
Aliaksey Kandratsenka
a3e1080c2e handle large alloc reporting locklessly
Which simplifies codes a bit.

Update issue #1159
2023-07-24 21:01:35 -04:00
Aliaksey Kandratsenka
f1eb3c82c6 correctly release memory when system's pagesize is >kPageSize
I.e. this covers case of arms that by default compile tcmalloc for 8k
logical pages (assuming 4k system pages), but can actually run on
systems with 64k pages.

Closes #1135
2023-07-24 21:01:35 -04:00
Aliaksey Kandratsenka
d521e3b30e move page heap allocations with alignment into page heap 2023-07-24 21:01:35 -04:00
Aliaksey Kandratsenka
f06ccc6f79 dont test HAVE_{STDINT,INTTYPES}_H
Those are fairly standard by now. We already require C++11 or later
compiler.
2023-07-22 14:32:40 -04:00
Gabriel Marin
4a923a6b36 tcmalloc: enable large object pointer offset check
Original CL: https://chromiumcodereview.appspot.com/10391178

  1. Enable large object pointer offset check in release build.
  Following code will now cause a check error:
  char* p = reinterpret_cast<char*>(malloc(kMaxSize + 1));
  free(p + 1);

  2. Remove a duplicated error reporting function "DieFromBadFreePointer",
  can use "InvalidGetAllocatedSize".

Reviewed-on: https://chromium-review.googlesource.com/1184335
[alkondratenko@gmail.com] removed some unrelated formatting changes
Signed-off-by: Aliaksey Kandratsenka <alkondratenko@gmail.com>
2023-07-13 19:41:21 -04:00
Aliaksey Kandratsenka
c29e3059dd mark CheckCachedSizeClass as used
It is only used from inside ASSERT and clang doesn't like it being
declared but unused when NDEBUG is set.
2023-07-13 19:21:30 -04:00
Jingyun Hua
fe85bbdf4c Add support for LoongArch.
Only 64-bit is supported at the moment.

Signed-off-by: Jingyun Hua <huajingyun@loongson.cn>
2022-02-08 20:47:10 +08:00
Aliaksey Kandratsenka
a015377a54 Set tcmalloc heap limit prior to testing oom
Otherwise it can take long time to OOM on osex.
2021-02-28 17:47:56 -08:00
Aliaksey Kandratsenka
c939dd5531 correctly check sized delete hint when asserts are on
We previously tested wrong assumption that larger than page size size
classes have addresses aligned on page size. New code is making proper
check of size class.

Also added is unit test coverage for this previously failing
condition. And we now also run "assert-ful" unittests for big tcmalloc
too, not only tcmalloc_minimal configuration.

This fixes github issue #1254
2021-02-28 15:54:22 -08:00
Aliaksey Kandratsenka
7c106ca241 don't bother checking for stl namespace and use std
Because there are no compilers left that don't do std namespace.
2021-02-14 15:44:14 -08:00
Aliaksey Kandratsenka
0d6f32b9ce use standard way to print size_t-sized ints
I.e. just use zu/zd/zx instead of finding out right size and defining
PRI{u,x,d}S defines. Compilers have long caught up to this part of
standard.
2021-02-14 15:44:14 -08:00
Jon Kohler
1bfcb5bc3a tcmalloc: fragmentation overhead instrumentation
This patch adds visibility into the overhead due to fragmentation for each size
class in the tcmalloc central free list, which is helpful when debugging
fragmentation issues.
2020-02-23 12:17:22 -08:00
Gabriel Marin
b85652bf26 Add generic.total_physical_bytes property to MallocExtension
Original CL:

- https://codereview.chromium.org/1410353005

  Add generic.total_physical_bytes property to MallocExtension

  The actual physical memory usage of tcmalloc cannot be obtained by
  GetNumericProperty. This accounts for the current_allocated_bytes,
  fragmentation and malloc metadata, and excludes the unmapped memory
  regions. This helps the user to understand how much memory is actually
  being used for the allocations that were made.

Reviewed-on: https://chromium-review.googlesource.com/1130803
2018-10-06 11:07:59 -07:00
Gabriel Marin
90df23c81f Make some tcmalloc constants truly const
Reviewed-on: https://chromium-review.googlesource.com/c/1130809
2018-10-05 17:17:55 -07:00
Aliaksey Kandratsenka
71c8cedaca Fix incompatible aliasing warnings
We aliased functions with different signatures and gcc now correctly
gives warning for that. Originally gcc 5 same code merging feature
caused us to alias more than necessary, but I am not able to reproduce
this problem anymore. So we're now aliasing only compatible functions.
2018-08-05 20:43:53 -07:00
HolyWu
f47a52ce85 Make _recalloc adhere to MS's definition 2018-05-21 16:08:27 +08:00
Junhao Li
fe87ffb7ea Disable large allocation report by default
Fixes issue #360.

[alkondratenko@gmail.com: adjusted commit message a bit]
[alkondratenko@gmail.com: adjusted configure help message]
Signed-off-by: Aliaksey Kandratsenka <alkondratenko@gmail.com>
2018-05-20 21:13:05 -07:00
HolyWu
497ea33165 Fix WIN32_OVERRIDE_ALLOCATORS for VS2017
At first I try to add some functions as what Chrome does at their
https://chromium.googlesource.com/chromium/src/+/master/base/allocator/allocator_shim_override_ucrt_symbols_win.h,
but it still fails. So I decide to remove all heap-related objects
from libucrt.lib to see what happens. At the end I find that a lot of
functions in the CRT directly invoke _malloc_base instead of
malloc (and the others alike), hence we need to override them as well.

This should close issue #716.

[alkondratenko@gmail.com: added reference to ticket]
Signed-off-by: Aliaksey Kandratsenka <alkondratenko@gmail.com>
2018-04-29 22:59:01 -07:00
Aliaksey Kandratsenka
33ae0ed2ae unbreak compilation on GNU/Linux i386
Recent commit to fix int overflow for implausibly huge allocation
added call to std::min. Notably, first arg was old size divided by
unsigned long 4. And on GNU/Linux i386 size_t is not long. So such
division was promoting first arg to unsigned long while second arg was
still size_t, so just unsigned. And that caused compilation to fail.

Fix is droping 'ul'.
2018-04-09 20:58:31 -07:00
Mao
1cb5de6db9 Explicitly prevent int overflow 2018-03-26 17:28:28 +08:00
Aliaksey Kandratsenka
47c99cf492 unbreak printing large span stats
One of recent commits started passing kMaxPages to printf but not used
it. Thankfully compilers gave us warning. Apparently intention was to
print real value of kMaxPages, so this is what we're doing now.
2018-03-24 20:12:44 -07:00
Todd Lipcon
db98aac55a Add a central free list for kMaxPages-sized spans
Previously, the central free list with index '0' was always unused,
since freelist index 'i' tracked spans of length 'i' and there are no
spans of length 0. This meant that there was no freelist for spans of
length 'kMaxPages'. In the default configuration, this corresponds to
1MB, which is a relatively common allocation size in a lot of
applications.

This changes the free list indexing so that index 'i' tracks spans of
length 'i + 1', meaning that free list index 0 is now used and
freelist[kMaxPages - 1] tracks allocations of kMaxPages size (1MB by
default).

This also fixes the stats output to indicate '>128' for the large spans
stats rather than the incorrect '>255' which must have referred to a
historical value of kMaxPages.

No new tests are added since this code is covered by existing tests.
2018-03-17 09:46:28 -07:00
Aliaksey Kandratsenka
2291714518 implement fast-path for memalign/aligned_alloc/tc_new_aligned
We're taking advantage of "natural" alignedness of our size classes
and instead of previous loop over size classes looking for suitably
aligned size, we now directly compute right size. See align_size_up
function. And that gives us ability to use our existing malloc
fast-path to make memalign neat and fast in most common
cases. I.e. memalign/aligned_alloc now only tail calls and thus avoids
expensive prologue/epilogue and is almost as fast as regular malloc.
2017-11-30 18:14:14 +00:00
Aliaksey Kandratsenka
79c91a9810 always define empty PERFTOOLS_NOTHROW
Because somehow clang still builds "this function will not throw" code
even with noexcept. Which breaks performance of
tc_malloc/tc_new_nothrow. The difference with throw() seems to be just
which function is called when unexpected exception happens.

So we work around this sillyness by simply dropping any exception
specification when compiling tcmalloc.
2017-11-29 21:44:52 +00:00
Aliaksey Kandratsenka
89fe59c831 Fix OOM handling in fast-path
Previous fast-path malloc implementation failed to arrange proper oom
handling for operator new. I.e. operator new is supposed to call new
handler and throw exception, which was not arranged in fast-path case.

Fixed code now passes pointer for oom function to
ThreadCache::FetchFromCentralCache which will call it in oom
condition. Test is added to verify correct behavior.

I've also updated some fast-path-related comments for more accuracy.
2017-11-29 21:44:49 +00:00
Aliaksey Kandratsenka
e6cd69bdec reintroduce aliasing for aligned delete
Without aliasing performance is likely to be at least partially
affected. There is still concern that aliasing between functions of
different signatures is not 100% safe. We now explicitly list of
architectures where aliasing is known to be safe.
2017-11-29 19:52:32 +00:00
Andrey Semashev
7efb3ecf37 Add support for C++17 operator new/delete for overaligned types.
- Add auto-detection of std::align_val_t presence to configure scripts. This
  indicates that the compiler supports C++17 operator new/delete overloads
  for overaligned types.

- Add auto-detection of -faligned-new compiler option that appeared in gcc 7.
  The option allows the compiler to generate calls to the new operators. It is
  needed for tests.

- Added overrides for the new operators. The overrides are enabled if the
  support for std::align_val_t has been detected. The implementation is mostly
  based on the infrastructure used by memalign, which had to be extended to
  support being used by C++ operators in addition to C functions. In particular,
  the debug version of the library has to distinguish memory allocated by
  memalign from that by operator new. The current implementation of sized
  overaligned delete operators do not make use of the supplied size argument
  except for the debug allocator because it is difficult to calculate the exact
  allocation size that was used to allocate memory with alignment. This can be
  done in the future.

- Removed forward declaration of std::nothrow_t. This was not portable as
  the standard library is not required to provide nothrow_t directly in
  namespace std (it could use e.g. an inline namespace within std). The <new>
  header needs to be included for std::align_val_t anyway.

- Fixed operator delete[] implementation in libc_override_redefine.h.

- Moved TC_ALIAS definition to the beginning of the file in tcmalloc.cc so that
  the macro is defined before its first use in nallocx.

- Added tests to verify the added operators.

[alkondratenko@gmail.com: fixed couple minor warnings, and some
whitespace change]
[alkondratenko@gmail.com: removed addition of TC_ALIAS in debug allocator]
Signed-off-by: Aliaksey Kandratsenka <alkondratenko@gmail.com>
2017-11-29 19:51:42 +00:00
Andrew Morrow
7a6e25f3b1 Add new statistics for the PageHeap
[alkondratenko@gmail.com: addressed init order mismatch warning]
Signed-off-by: Aliaksey Kandratsenka <alkondratenko@gmail.com>
2017-11-28 14:19:08 +00:00
Romain Geissler
2d220c7e26 Replace "throw()" by "PERFTOOLS_NOTHROW"
Automatically done with:
sed -e 's/\<throw[[:space:]]*([[:space:]]*)/PERFTOOLS_NOTHROW/g' -i
$(git grep -l 'throw[[:space:]]*([[:space:]]*)')

[alkondratenko@gmail.com: updated to define empty PERFTOOLS_NOTHROW
only on pre-c++11 standards]
2017-07-09 14:10:06 -07:00
Romain Geissler
e5fbd0e24e Rename PERFTOOLS_THROW into PERFTOOLS_NOTHROW.
Automatically done with:
sed -e 's/\<PERFTOOLS_THROW\>/PERFTOOLS_NOTHROW/g' -i $(git grep -l PERFTOOLS_THROW)
2017-07-08 16:22:27 -07:00
KernelMaker
a495969cb6 update the prev_class_size in each loop, or the min_object_size of tcmalloc.thread will always be 1 when calling GetFreeListSizes 2017-05-29 15:05:55 -07:00
Aliaksey Kandratsenka
cef582350c align fast-path functions only if compiler supports that
Apparently gcc only supports __attribute__((aligned(N))) on functions
only since version 4.3. So lets test it in configure script and only
use when possible. We now use CACHELINE_ALIGNED_FN macro for aligning
functions.
2017-05-22 01:55:50 -07:00
Aliaksey Kandratsenka
bddf862b18 actually support very early freeing of NULL
This was caught by unit tests on centos 5. Apparently some early
thingy is trying to do vprintf which calls free(0). Which used to
crash since before size class cache is initialized it'll report
hit (with size class 0) for NULL pointer, so we'd miss the case of
checking NULL pointer free and crash.

The fix is to check for IsInited in the case when thread cache is
null, and if so then we escalte to free_null_or_invalid.
2017-05-22 01:54:56 -07:00
Aliaksey Kandratsenka
b1d88662cb change size class to be represented by 32 bit int
This moves code closer to Google-internal version and provides for
slightly tighter code encoding on amd64.
2017-05-14 19:04:56 -07:00
Aliaksey Kandratsenka
7bc34ad1f6 support different number of size classes at runtime
With TCMALLOC_TRANSFER_NUM_OBJ environment variable we can change
transfer batch size. And with that comes slightly different number of
size classes depending on value of transfer batch size.

We used to have hardcoded number of size classes, so we couldn't
really support any batch size setting.

This commit adds support for dynamic number of size classes (runtime
value returned by Static::num_size_classes()).
2017-05-14 19:04:56 -07:00
Aliaksey Kandratsenka
4585b78c8d massage allocation and deallocation fast-path for performance
This is significant speedup of fast-path of malloc. Large part comes
from avoiding expensive function prologue/epilogue. Which is achieved
by making sure that tc_{malloc,new,free} etc are small functions that
do only tail-calls. We keep only critical path in those functions and
tail-call to slower "full" versions when we need to deal with less
common case. This helps compiler generate much tidier code.

Fast-path readyness check is now different too. We used to have "min
size for slow path" variable, which was set to non-zero value when we
know that thread cache is present and ready. We now have use
thread-cache pointer not equal to NULL as readyness check.

There is special ThreadCache::threadlocal_data_.fast_path_heap copy of
that pointer that can be temporarily nulled to disable malloc fast
path. This is used to enable emergency malloc.

There is also slight change to tracking thread cache size. Instead of
tracking total size of free list, it now tracks size headroom. This
allows for slightly faster deallocation fast-path check where we're
checking headroom to stay above zero. This check is a bit faster than
comparing with max_size_.
2017-05-14 19:04:56 -07:00
Aliaksey Kandratsenka
5964a1d9c9 always inline a number of hot functions 2017-05-14 19:04:55 -07:00
Aliaksey Kandratsenka
e419b7b9a6 introduce ATTRIBUTE_ALWAYS_INLINE 2017-05-14 19:04:55 -07:00
Aliaksey Kandratsenka
27da4ade70 reduce size of class_to_size_ array
Since 32-bit int is enough and accessing smaller array will use a bit
less of cache.
2017-05-14 19:04:55 -07:00
Aliaksey Kandratsenka
121b1cb32e slightly faster size class cache
Lower bits of page index are still used as index into hash
table. Those lower bits are zeroed, or-ed with size class and
placed into hash table. So checking is just loading value from hash
table, xoring with higher bits of address and checking if resultant
value is lower than 128. Notably, size class 0 is not considered
"invalid" anymore.
2017-05-14 19:04:55 -07:00
Aliaksey Kandratsenka
b57c0bad41 init tcmalloc prior to replacing system alloc
Currently on windows, we're depending on uninitialized tcmalloc
variables to detect freeing foreign malloc's chunks. This works
somewhat by chance due to 0-initialized size classes cache working as
cache with no values. But this is about to change, so lets do explicit
initialization.
2017-05-14 19:04:55 -07:00
Aliaksey Kandratsenka
dfd53da578 set ENOMEM in handle_oom 2017-05-14 19:04:55 -07:00
Aliaksey Kandratsenka
507a105e84 pass original size to DoSampledAllocation
It makes heap profiles more accurate. Google's internal malloc is doing
it as well.
2017-05-14 19:04:55 -07:00
Aliaksey Kandratsenka
bb77979dea don't declare throw() on malloc funtions since it is faster
Apparently throw() on functions actually asks compiler to generate code
to detect unexpected exceptions. Which prevents tail calls optimization.

So in order to re-enable this optimization, we simply don't tell
compiler about throw() at all. C++11 noexcept would be even better, but
it is not universally available yet.

So we change to no exception specifications. Which at least for gcc &
clang on Linux (and likely for all ELF platforms, if not just all)
really eliminates all overhead of exceptions.
2017-05-14 19:04:55 -07:00