this change serves multiple purposes:
1. it ensures that static linking of memalign-family functions will
pull in the system malloc implementation, thereby causing link errors
if an attempt is made to link the system memalign functions with a
replacement malloc (incomplete allocator replacement).
2. it eliminates calls to free that are unpaired with allocations,
which are confusing when setting breakpoints or tracing execution.
as a bonus, making __bin_chunk external may discourage aggressive and
unnecessary inlining of it.
the generated code should be mostly unchanged, except for explicit use
of C_INUSE in place of copying the low bits from existing chunk
headers/footers.
these changes also remove mild UB due to dubious arithmetic on
pointers into imaginary size_t[] arrays.
commit c9f415d7ea included checks to
make calloc fallback to memset if used with a replaced malloc that
didn't also replace calloc, and the memalign family fail if free has
been replaced. however, the checks gave false positives for
replacement whenever malloc or free resolved to a PLT entry in the
main program.
for now, disable the checks so as not to leave libc in a broken state.
this means that the properties documented in the above commit are no
longer satisfied; failure to replace calloc and the memalign family
along with malloc is unsafe if they are ever called.
the calloc checks were correct but useless for static linking. in both
cases (simple or full malloc), calloc and malloc are in a source file
together, so replacement of one but not the other would give linking
errors. the memalign-family check was useful for static linking, but
broken for dynamic as described above, and can be replaced with a
better link-time check.
__ARM_ARCH_6ZK__ is a gcc specific historical typo which may not be
defined by other compilers.
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02237.html
To avoid unexpected results when building for ARMv6KZ with clang, the
correct form of the macro (ie 6KZ) needs to be tested. The incorrect
form of the macro (ie 6ZK) still needs to be tested for compatibility
with pre-2015 versions of gcc.
Provide an ARM specific a_ctz_32 helper function for architecture
versions for which it can be implemented efficiently via the "rbit"
instruction (ie all Thumb-2 capable versions of ARM v6 and above).
Update atomic.h to provide a_ctz_l in all cases (atomic_arch.h should
now only provide a_ctz_32 and/or a_ctz_64).
The generic version of a_ctz_32 now takes advantage of a_clz_32 if
available and the generic a_ctz_64 now makes use of a_ctz_32.
replacement is subject to conditions on the replacement functions.
they may only call functions which are async-signal-safe, as specified
either by POSIX or as an implementation-defined extension. if any
allocator functions are replaced, at least malloc, realloc, and free
must be provided. if calloc is not provided, it will behave as
malloc+memset. any of the memalign-family functions not provided will
fail with ENOMEM.
in order to implement the above properties, calloc and __memalign
check that they are using their own malloc or free, respectively.
choice to check malloc or free is based on considerations of
supporting __simple_malloc. in order to make this work, calloc is
split into separate versions for __simple_malloc and full malloc;
commit ba819787ee already did most of
the split anyway, and completing it saves an extra call frame.
previously, use of -Bsymbolic-functions made dynamic interposition
impossible. now, we are using an explicit dynamic-list, so add
allocator functions to the list. most are not referenced anyway, but
all are added for completeness.
instead of using a waiters count, add a bit to the lock field
indicating that the lock may have waiters. threads which obtain the
lock after contending for it will perform a potentially-spurious wake
when they release the lock.
the existing laddr function for fdpic cannot translate ELF virtual
addresses outside of the LOAD segments to runtime addresses because
the fdpic loadmap only covers the logically-mapped part. however the
whole point of reclaim_gaps is to recover the slack space up to the
page boundaries, so it needs to work with such addresses.
add a new laddr_pg function that accepts any address in the page range
for the LOAD segment by expanding the loadmap records out to page
boundaries. only use the new version for reclaim_gaps, so as not to
impact performance of other address lookups.
also, only use laddr_pg for the start address of a gap; the end
address lies one byte beyond the end, potentially in a different page
where it would get mapped differently. instead of mapping end, apply
the length (end-start) to the mapped value of start.
we have always bound symbols at libc.so link time rather than runtime
to minimize startup-time relocations and overhead of calls through the
PLT, and possibly also to preclude interposition that would not work
correctly anyway if allowed. historically, binding at link-time was
also necessary for the dynamic linker to work, but the dynamic linker
bootstrap overhaul in commit f3ddd173806fd5c60b3f034528ca24542aecc5b9
made it unnecessary.
our use of -Bsymbolic-functions, rather than -Bsymbolic, was chosen
because the latter is incompatible with public global data; it makes
it incompatible with copy relocations in the main program. however,
not all global data needs to be public. by using --dynamic-list
instead with an explicit list, we can reduce the number of symbolic
relocations left for runtime.
this change will also allow us to permit interposition of specific
functions (e.g. the allocator) if/when we want to, by adding them to
the dynamic list.
the Linux SYS_nice syscall is unusable because it does not return the
newly set priority. always use SYS_setpriority. also avoid overflows
in addition of inc by handling large inc values directly without
examining the old nice value.
Implementation of __malloc0 in malloc.c takes care to preserve zero
pages by overwriting only non-zero data. However, malloc must have
already modified auxiliary heap data just before and beyond the
allocated region, so we know that edge pages need not be preserved.
For allocations smaller than one page, pass them immediately to memset.
Otherwise, use memset to handle partial pages at the head and tail of
the allocation, and scan complete pages in the interior. Optimize the
scanning loop by processing 16 bytes per iteration and handling rest of
page via memset as soon as a non-zero byte is found.
the catan implementation from OpenBSD includes a FIXME-annotated
"overflow" branch that produces a meaningless and incorrect
large-magnitude result. it was reachable via three paths,
corresponding to gotos removed by this commit, in order:
1. pure imaginary argument with imaginary component greater than 1 in
magnitude. this case does not seem at all exceptional and is
handled (at least with the quality currently expected from our
complex math functions) by the existing non-exceptional code path.
2. arguments on the unit circle, including the pure-real argument 1.0.
these are not exceptional except for ±i, which should produce
results with infinite imaginary component and which lead to
computation of atan2(±0,0) in the existing non-exceptional code
path. such calls to atan2() however are well-defined by POSIX.
3. the specific argument +i. this route should be unreachable due to
the above (2), but subtle rounding effects might have made it
possible in rare cases. continuing on the non-exceptional code path
in this case would lead to computing the (real) log of an infinite
argument, then producing a NAN when multiplying it by I.
for now, remove the exceptional code paths entirely. replace the
multiplication by I with construction of a complex number using the
CMPLX macro so that the NAN issue (3) prevented cannot arise.
with these changes, catan should give reasonably correct results for
real arguments, and should no longer give completely-wrong results for
pure-imaginary arguments outside the interval (-i,+i).
the factor of -i noted in the comment at the top of casin.c was
omitted from the actual code, yielding a result rotated 90 degrees and
propagating into errors in other functions defined in terms of casin.
implement multiplication by -i as a rotation of the real and imaginary
parts of the result, rather than by actual multiplication, since the
latter cannot be optimized without knowledge that the operand is
finite. here, the rotation is the actual intent, anyway.
Commit 8a6bd7307d added support for
padding specifier extensions to strftime, but did not modify wcsftime.
In the process, it added a parameter to __strftime_fmt_1 in strftime.c,
but failed to update the prototype in wcsftime.c. This was found by
compiling musl with LTO:
src/time/wcsftime.c:7:13: warning: type of '__strftime_fmt_1' does \
not match original declaration [-Wlto-type-mismatch]
Fix the prototype of __strftime_fmt_1 in wcsftime.c, and generate the
'pad' argument the same way as it is done in strftime.
it was reported by Erik Bosman that poll fails without setting revents
when the nfds argument exceeds the current value for RLIMIT_NOFILE,
causing the subsequent open calls to be bypassed. if the rlimit is
either 1 or 2, this leaves fd 0 and 1 potentially closed but openable
when the application code is reached.
based on a brief reading of the poll syscall documentation and code,
it may be possible for poll to fail under other attacker-controlled
conditions as well. if it turns out these are reasonable conditions
that may happen in the real world, we may have to go back and
implement fallbacks to probe each fd individually if poll fails, but
for now, keep things simple and treat all poll failures as fatal.
if double precision r=x*y+z is not a half way case between two single
precision floats or it is an exact result then fmaf returns (float)r.
however the exactness check was wrong when |x*y| < |z| and could cause
incorrectly rounded result in nearest rounding mode when r is a half
way case.
fmaf(-0x1.26524ep-54, -0x1.cb7868p+11, 0x1.d10f5ep-29)
was incorrectly rounded up to 0x1.d117ap-29 instead of 0x1.d1179ep-29.
(exact result is 0x1.d1179efffffffecp-29, r is 0x1.d1179fp-29)
commit d93c0740d8 added use of feature
test macros without including features.h, causing a definition that
should be exposed in the default profile, TSVTX, to appear only when
_XOPEN_SOURCE or higher is explicitly defined.
previously, MEMOPS_SRCS failed to include arch-specific replacement
files for memcpy, etc., omitting CFLAGS_MEMOPS and thereby potentially
causing build failure if an arch provided C (rather than asm)
replacements for these files.
instead of trying to explicitly include all the files that might have
arch replacements, which is prone to human error, extract final names
to be used out of $(LIBC_OBJS), where the rules for arch replacements
have already been applied. do the same for NOSSP_OBJS, using CRT_OBJS
and LDSO_OBJS rather than repeating ourselves with $(wildcard...) and
explicit pathnames again.
standing alone, both the signed and int keywords identify the same
type, a (signed) int. however the C language has an exception where,
when the lone keyword int is used to declare a bitfield, it's
implementation-defined whether the bitfield is signed or unsigned. C11
footnote 125 extends this implementation-definedness to typedefs, and
DR#315 extends it to other integer types (for which support with
bitfields is implementation-defined).
while reasonable ABIs (all the ones we support) define bitfields as
signed by default, GCC and compatible compilers offer an option
-funsigned-bitfields to change the default. while any signed types
defined without explicit use of the signed keyword are affected, the
stdint.h types, especially intNN_t, have a natural use in bitfields.
ensure that bitfields defined with these types always have the correct
signedness regardless of compiler & flags used.
see also GCC PR 83294.
the output delay features (NL*, CR*, TAB*, BS*, and VT*) are
XSI-shaded. VT* is in the V* namespace reservation but the rest need
to be suppressed in base POSIX namespace.
unfortunately this change introduces feature test macro checks into
another bits header. at some point these checks should be simplified
by having features.h handle the "FTM X implies Y" relationships.
this must have been taken from POSIX without realizing that it was
meaningless. the resolution to Austin Group issue #844 removed it from
the standard.
PAGESIZE is actually the version defined in POSIX base, with PAGE_SIZE
being in the XSI option. use PAGESIZE as the underlying definition to
facilitate making exposure of PAGE_SIZE conditional.
use of MB_CUR_MAX encoded a hidden dependency on the currently active
locale for the calling thread, whereas nl_langinfo_l is supposed to
report for the locale passed as an argument.
general policy is that all source files defining a public API or an
ABI mechanism referenced by a public header should include the public
header that declares the interface, so that the compiler or analysis
tools can check the consistency of the declarations. Alexander Monakov
pointed out a number of violations of this principle a few years back.
fix them now.
add a member of appropriate type to the fpos_t union so that accesses
are well-defined. use long long instead of off_t since off_t is not
always exposed in stdio.h and there's no namespace-clean alias for it.
access is still performed using pointer casts rather than by naming
the union member as a matter of style; to the extent possible, the
naming of fields in opaque types defined in the public headers is not
treated as an API contract with the implementation. access via the
pointer cast is valid as long as the union has a member of matching
type.
previously this macro used an odd if/else form instead of the more
idiomatic do/while(0), making it unsafe against omission of trailing
semicolon. the omission would make the following statement conditional
instead of producing an error.
formally, calling readv with a zero-length first iov component should
behave identically to calling read on just the second component, but
presence of a zero-length iov component has triggered bugs in some
kernels and performs significantly worse than a simple read on some
file types.