they lock faulted pages into memory (useful when a small part of a
large mapped file needs efficient access), new in linux v4.4, commit
b0f205c2a3082dd9081f9a94e50658c5fa906ff1
MLOCK_* is not in the POSIX reserved namespace for sys/mman.h
this is mlock with a flags argument, new in linux commit
a8ca5d0ecbdde5cc3d7accacbd69968b0c98764e
as usual microblaze and sh don't have allocated syscall number yet.
new in linux v4.3 added for aarch64, arm, i386, mips, or1k, powerpc,
x32 and x86_64.
membarrier is a system wide memory barrier, moves most of the
synchronization cost to one side, new in kernel commit
5b25b13ab08f616efd566347d809b4ece54570d1
userfaultfd is useful for qemu and is new in kernel commit
8d2afd96c20316d112e04d935d9e09150e988397
switch_endian is powerpc only for switching endianness, new in commit
529d235a0e190ded1d21ccc80a73e625ebcad09b
this commit mostly makes consistent things like spacing, function
ordering in atomic_arch.h, argument names, use of volatile, etc.
a_ctz_l was also removed from x86_64 since atomic.h provides it
automatically using a_ctz_64.
rather than having each arch provide its own atomic.h, there is a new
shared atomic.h in src/internal which pulls arch-specific definitions
from arc/$(ARCH)/atomic_arch.h. the latter can be extremely minimal,
defining only a_cas or new ll/sc type primitives which the shared
atomic.h will use to construct everything else.
this commit avoids making heavy changes to the individual archs'
atomic implementations. definitions which are identical or
near-identical to what the new shared atomic.h would produce have been
removed, but otherwise the changes made are just hooking up the
arch-specific files to the new infrastructure. major changes to take
advantage of the new system will come in subsequent commits.
using the actual mcontext_t definition rather than an overlaid pointer
array both improves correctness/readability and eliminates some ugly
hacks for archs with 64-bit registers bit 32-bit program counter.
also fix UB due to comparison of pointers not in a common array
object.
commit 3c43c0761e fixed missing
synchronization in the atomic store operation for i386 and x86_64, but
opted to use mfence for the barrier on x86_64 where it's always
available. however, in practice mfence is significantly slower than
the barrier approach used on i386 (a nop-like lock orl operation).
this commit changes x86_64 (and x32) to use the faster barrier.
despite being strongly ordered, the x86 memory model does not preclude
reordering of loads across earlier stores. while a plain store
suffices as a release barrier, we actually need a full barrier, since
users of a_store subsequently load a waiter count to determine whether
to issue a futex wait, and using a stale count will result in soft
(fail-to-wake) deadlocks. these deadlocks were observed in malloc and
possible with stdio locks and other libc-internal locking.
on i386, an atomic operation on the caller's stack is used as the
barrier rather than performing the store itself using xchg; this
avoids the need to read the cache line on which the store is being
performed. mfence is used on x86_64 where it's always available, and
could be used on i386 with the appropriate cpu model checks if it's
shown to perform better.
conceptually, and on other archs, these functions take a pointer to
int, but in the i386, x86_64, and x32 versions of atomic.h, they took
a pointer to void instead.
this overhaul further reduces the amount of arch-specific code needed
by the dynamic linker and removes a number of assumptions, including:
- that symbolic function references inside libc are bound at link time
via the linker option -Bsymbolic-functions.
- that libc functions used by the dynamic linker do not require
access to data symbols.
- that static/internal function calls and data accesses can be made
without performing any relocations, or that arch-specific startup
code handled any such relocations needed.
removing these assumptions paves the way for allowing libc.so itself
to be built with stack protector (among other things), and is achieved
by a three-stage bootstrap process:
1. relative relocations are processed with a flat function.
2. symbolic relocations are processed with no external calls/data.
3. main program and dependency libs are processed with a
fully-functional libc/ldso.
reduction in arch-specific code is achived through the following:
- crt_arch.h, used for generating crt1.o, now provides the entry point
for the dynamic linker too.
- asm is no longer responsible for skipping the beginning of argv[]
when ldso is invoked as a command.
- the functionality previously provided by __reloc_self for heavily
GOT-dependent RISC archs is now the arch-agnostic stage-1.
- arch-specific relocation type codes are mapped directly as macros
rather than via an inline translation function/switch statement.
while it's the same for all presently supported archs, it differs at
least on sparc, and conceptually it's no less arch-specific than the
other O_* macros. O_SEARCH and O_EXEC are still defined in terms of
O_PATH in the main fcntl.h.
the previous values (2k min and 8k default) were too small for some
archs. aarch64 reserves 4k in the signal context for future extensions
and requires about 4.5k total, and powerpc reportedly uses over 2k.
the new minimums are chosen to fit the saved context and also allow a
minimal signal handler to run.
since the default (SIGSTKSZ) has always been 6k larger than the
minimum, it is also increased to maintain the 6k usable by the signal
handler. this happens to be able to store one pathname buffer and
should be sufficient for calling any function in libc that doesn't
involve conversion between floating point and decimal representations.
x86 (both 32-bit and 64-bit variants) may also need a larger minimum
(around 2.5k) in the future to support avx-512, but the values on
these archs are left alone for now pending further analysis.
the value for PTHREAD_STACK_MIN is not increased to match MINSIGSTKSZ
at this time. this is so as not to preclude applications from using
extremely small thread stacks when they know they will not be handling
signals. unfortunately cancellation and multi-threaded set*id() use
signals as an implementation detail and therefore require a stack
large enough for a signal context, so applications which use extremely
small thread stacks may still need to avoid using these features.
these macros have the same distinct definition on blackfin, frv, m68k,
mips, sparc and xtensa kernels. POLLMSG and POLLRDHUP additionally
differ on sparc.
the memory model we use internally for atomics permits plain loads of
values which may be subject to concurrent modification without
requiring that a special load function be used. since a compiler is
free to make transformations that alter the number of loads or the way
in which loads are performed, the compiler is theoretically free to
break this usage. the most obvious concern is with atomic cas
constructs: something of the form tmp=*p;a_cas(p,tmp,f(tmp)); could be
transformed to a_cas(p,*p,f(*p)); where the latter is intended to show
multiple loads of *p whose resulting values might fail to be equal;
this would break the atomicity of the whole operation. but even more
fundamental breakage is possible.
with the changes being made now, objects that may be modified by
atomics are modeled as volatile, and the atomic operations performed
on them by other threads are modeled as asynchronous stores by
hardware which happens to be acting on the request of another thread.
such modeling of course does not itself address memory synchronization
between cores/cpus, but that aspect was already handled. this all
seems less than ideal, but it's the best we can do without mandating a
C11 compiler and using the C11 model for atomics.
in the case of pthread_once_t, the ABI type of the underlying object
is not volatile-qualified. so we are assuming that accessing the
object through a volatile-qualified lvalue via casts yields volatile
access semantics. the language of the C standard is somewhat unclear
on this matter, but this is an assumption the linux kernel also makes,
and seems to be the correct interpretation of the standard.
this syscall allows fexecve to be implemented without /proc, it is new
in linux v3.19, added in commit 51f39a1f0cea1cacf8c787f652f26dfee9611874
(sh and microblaze do not have allocated syscall numbers yet)
added a x32 fix as well: the io_setup and io_submit syscalls are no
longer common with x86_64, so use the x32 specific numbers.
x86_64 syscall.h defined some musl internal syscall names and made
them public. These defines were already moved to src/internal/syscall.h
(except for SYS_fadvise which is added now) so the cruft in x86_64
syscall.h is not needed.
the definitions are generic for all kernel archs. exposure of these
macros now only occurs on the same feature test as for the function
accepting them, which is believed to be more correct.
these syscalls are new in linux v3.18, bpf is present on all
supported archs except sh, kexec_file_load is only allocted for
x86_64 and x32 yet.
bpf was added in linux commit 99c55f7d47c0dc6fc64729f37bf435abf43f4c60
kexec_file_load syscall number was allocated in commit
f0895685c7fd8c938c91a9d8a6f7c11f22df58d2
these syscalls are new in linux v3.17 and present on all supported
archs except sh.
seccomp was added in commit 48dc92b9fc3926844257316e75ba11eb5c742b2c
it has operation, flags and pointer arguments (if flags==0 then it is
the same as prctl(PR_SET_SECCOMP,...)), the uapi header for flag
definitions is linux/seccomp.h
getrandom was added in commit c6e9d6f38894798696f23c8084ca7edbf16ee895
it provides an entropy source when open("/dev/urandom",..) would fail,
the uapi header for flags is linux/random.h
memfd_create was added in commit 9183df25fe7b194563db3fec6dc3202a5855839c
it allows anon mmap to have an fd, that can be shared, sealed and needs no
mount point, the uapi header for flags is linux/memfd.h
based on patch by Jens Gustedt.
mtx_t and cnd_t are defined in such a way that they are formally
"compatible types" with pthread_mutex_t and pthread_cond_t,
respectively, when accessed from a different translation unit. this
makes it possible to implement the C11 functions using the pthread
functions (which will dereference them with the pthread types) without
having to use the same types, which would necessitate either namespace
violations (exposing pthread type names in threads.h) or incompatible
changes to the C++ name mangling ABI for the pthread types.
for the rest of the types, things are much simpler; using identical
types is possible without any namespace considerations.
unfortunately this needs to be able to vary by arch, because of a huge
mess GCC made: the GCC definition, which became the ABI, depends on
quirks in GCC's definition of __alignof__, which does not match the
formal alignment of the type.
GCC's __alignof__ unexpectedly exposes the an implementation detail,
its "preferred alignment" for the type, rather than the formal/ABI
alignment of the type, which it only actually uses in structures. on
most archs the two values are the same, but on some (at least i386)
the preferred alignment is greater than the ABI alignment.
I considered using _Alignas(8) unconditionally, but on at least one
arch (or1k), the alignment of max_align_t with GCC's definition is
only 4 (even the "preferred alignment" for these types is only 4).
when manipulating the robust list, the order of stores matters,
because the code may be asynchronously interrupted by a fatal signal
and the kernel will then access the robust list in what is essentially
an async-signal context.
previously, aliasing considerations made it seem unlikely that a
compiler could reorder the stores, but proving that they could not be
reordered incorrectly would have been extremely difficult. instead
I've opted to make all the pointers used as part of the robust list,
including those in the robust list head and in the individual mutexes,
volatile.
in addition, the format of the robust list has been changed to point
back to the head at the end, rather than ending with a null pointer.
this is to match the documented kernel robust list ABI. the null
pointer, which was previously used, only worked because faults during
access terminate the robust list processing.
the a_cas_l, a_swap_l, a_swap_p, and a_store_l operations were
probably used a long time ago when only i386 and x86_64 were
supported. as other archs were added, support for them was
inconsistent, and they are obviously not in use at present. having
them around potentially confuses readers working on new ports, and the
type-punning hacks and inconsistent use of types in their definitions
is not a style I wish to perpetuate in the source tree, so removing
them seems appropriate.
this was one of the main instances of ugly code duplication: all archs
use basically the same types of relocations, but roughly equivalent
logic was duplicated for each arch to account for the different naming
and numbering of relocation types and variation in whether REL or RELA
records are used.
as an added bonus, both REL and RELA are now supported on all archs,
regardless of which is used by the standard toolchain.
the immediate motivation is supporting TLSDESC relocations which
require allocation and thus may fail (unless we pre-allocate), but
this mechanism should also be used for throwing an error on
unsupported or invalid relocation types, and perhaps in certain cases,
for reporting when a relocation is not satisfiable.
linux 3.14 introduced sched_getattr and sched_setattr syscalls in
commit d50dde5a10f305253cbc3855307f608f8a3c5f73
and the related SCHED_DEADLINE scheduling policy in
commit aab03e05e8f7e26f51dee792beddcb5cca9215a5
but struct sched_attr "extended scheduling parameters data structure"
is not yet exported to userspace (necessary for using the syscalls)
so related uapi definitions are not added yet.
the vdso symbol lookup code is based on the original 2011 patch by
Nicholas J. Kain, with some streamlining, pointer arithmetic fixes,
and one symbol version matching fix.
on the consumer side (clock_gettime), per-arch macros for the
particular symbol name and version to lookup are added in
syscall_arch.h, and no vdso code is pulled in on archs which do not
define these macros. at this time, vdso is enabled only on x86_64.
the vdso support at the dynamic linker level is no longer useful to
libc, but is left in place for the sake of debuggers (which may need
the vdso in the link map to find its functions) and possibly use with
dlsym.
The mips arch is special in that it uses different RLIMIT_
numbers than other archs, so allow bits/resource.h to override
the default RLIMIT_ numbers (empty on all archs except mips).
Reported by orc.
the definition was found to be incorrect at least for powerpc, and
fixing this cleanly requires making the definition arch-specific. this
will allow cleaning up the definition for other archs to make it more
specific, and reversing some of the ugliness (time_t hacks) introduced
with the x32 port.
this first commit simply copies the existing definition to each arch
without any changes. this is intentional, to make it easier to review
changes made on a per-arch basis.
the operand size is unnecessary, since the assembler knows it from the
destination register size. removing the suffix makes it so the same
code should work for x32.
the fix should be complete on archs that use the generic definitions
(i386, arm, x86_64, microblaze), but mips and powerpc have not been
checked thoroughly and may need more fixes.
the only immediate effect of this commit is enabling PIE support on
some archs that did not previously have any Scrt1.s, since the
existing asm files for crt1 override this C code. so some of the
crt_arch.h files committed are only there for the sake of documenting
what their archs "would do" if they used the new C-based crt1.
the expectation is that new archs should use this new system rather
than using heavy asm for crt1. aside from being easier and less
error-prone, it also ensures that PIE support is available immediately
(since Scrt1.o is generated from the same C source, using -fPIC)
rather than having to be added as an afterthought in the porting
process.
this is necessary to meet the C++ ABI target. alternatives were
considered to avoid the size increase for non-sig jmp_buf objects, but
they seemed to have worse properties. moreover, the relative size
increase is only extreme on x86[_64]; one way of interpreting this is
that, if the size increase from this patch makes jmp_buf use too much
memory, then the program was already using too much memory when built
for non-x86 archs.
rather than moving nlink_t back to the arch-specific file, I've added
a macro _Reg defined to the canonical type for register-size values on
the arch. this is not the same as _Addr for (not-yet-supported)
32-on-64 pseudo-archs like x32 and mips n32, so a new macro was
needed.