the previous values (2k min and 8k default) were too small for some
archs. aarch64 reserves 4k in the signal context for future extensions
and requires about 4.5k total, and powerpc reportedly uses over 2k.
the new minimums are chosen to fit the saved context and also allow a
minimal signal handler to run.
since the default (SIGSTKSZ) has always been 6k larger than the
minimum, it is also increased to maintain the 6k usable by the signal
handler. this happens to be able to store one pathname buffer and
should be sufficient for calling any function in libc that doesn't
involve conversion between floating point and decimal representations.
x86 (both 32-bit and 64-bit variants) may also need a larger minimum
(around 2.5k) in the future to support avx-512, but the values on
these archs are left alone for now pending further analysis.
the value for PTHREAD_STACK_MIN is not increased to match MINSIGSTKSZ
at this time. this is so as not to preclude applications from using
extremely small thread stacks when they know they will not be handling
signals. unfortunately cancellation and multi-threaded set*id() use
signals as an implementation detail and therefore require a stack
large enough for a signal context, so applications which use extremely
small thread stacks may still need to avoid using these features.
The unwind code in libgcc uses this type for unwinding across signal
handlers. On aarch64 the kernel may place a sequence of structs on the
signal stack on top of the ucontext to provide additional information.
The unwinder only needs the header, but added all the types the kernel
currently defines for this mechanism because they are part of the uapi.
previously, commit e7b9887e8b65253087ab0b209dc8dd85c9f09614 aligned
the sizes with the glibc ABI. subsequent discussion during the merge
of the aarch64 port reached a conclusion that we should reject larger
arch-specific sizes, which have significant cost and no benefit, and
stick with the existing common 32-bit sizes for all 32-bit/ILP32 archs
and the x86_64 sizes for 64-bit archs.
one peculiarity of this change is that x32 pthread_attr_t is now
larger in musl than in the glibc x32 ABI, making it unsafe to call
pthread_attr_init from x32 code that was compiled against glibc. with
all the ABI issues of x32, it's not clear that ABI compatibility will
ever work, but if it's needed, pthread_attr_init and related functions
could be modified not to write to the last slot of the object.
this is not a regression versus previous releases, since on previous
releases the x32 pthread type sizes were all severely oversized
already (due to incorrectly using the x86_64 LP64 definitions).
moreover, x32 is still considered experimental and not ABI-stable.
This adds complete aarch64 target support including bigendian subarch.
Some of the long double math functions are known to be broken otherwise
interfaces should be fully functional, but at this point consider this
port experimental.
Initial work on this port was done by Sireesh Tripurari and Kevin Bortis.
these macros have the same distinct definition on blackfin, frv, m68k,
mips, sparc and xtensa kernels. POLLMSG and POLLRDHUP additionally
differ on sparc.
the previous definitions were copied from x86_64. not only did they
fail to match the ABI sizes; they also wrongly encoded an assumption
that long/pointer types are twice as large as int.
the memory model we use internally for atomics permits plain loads of
values which may be subject to concurrent modification without
requiring that a special load function be used. since a compiler is
free to make transformations that alter the number of loads or the way
in which loads are performed, the compiler is theoretically free to
break this usage. the most obvious concern is with atomic cas
constructs: something of the form tmp=*p;a_cas(p,tmp,f(tmp)); could be
transformed to a_cas(p,*p,f(*p)); where the latter is intended to show
multiple loads of *p whose resulting values might fail to be equal;
this would break the atomicity of the whole operation. but even more
fundamental breakage is possible.
with the changes being made now, objects that may be modified by
atomics are modeled as volatile, and the atomic operations performed
on them by other threads are modeled as asynchronous stores by
hardware which happens to be acting on the request of another thread.
such modeling of course does not itself address memory synchronization
between cores/cpus, but that aspect was already handled. this all
seems less than ideal, but it's the best we can do without mandating a
C11 compiler and using the C11 model for atomics.
in the case of pthread_once_t, the ABI type of the underlying object
is not volatile-qualified. so we are assuming that accessing the
object through a volatile-qualified lvalue via casts yields volatile
access semantics. the language of the C standard is somewhat unclear
on this matter, but this is an assumption the linux kernel also makes,
and seems to be the correct interpretation of the standard.
this syscall allows fexecve to be implemented without /proc, it is new
in linux v3.19, added in commit 51f39a1f0cea1cacf8c787f652f26dfee9611874
(sh and microblaze do not have allocated syscall numbers yet)
added a x32 fix as well: the io_setup and io_submit syscalls are no
longer common with x86_64, so use the x32 specific numbers.
x86_64 syscall.h defined some musl internal syscall names and made
them public. These defines were already moved to src/internal/syscall.h
(except for SYS_fadvise which is added now) so the cruft in x86_64
syscall.h is not needed.
the definitions are generic for all kernel archs. exposure of these
macros now only occurs on the same feature test as for the function
accepting them, which is believed to be more correct.
these syscalls are new in linux v3.18, bpf is present on all
supported archs except sh, kexec_file_load is only allocted for
x86_64 and x32 yet.
bpf was added in linux commit 99c55f7d47c0dc6fc64729f37bf435abf43f4c60
kexec_file_load syscall number was allocated in commit
f0895685c7fd8c938c91a9d8a6f7c11f22df58d2
except powerpc, which still lacks inline syscalls simply because
nobody has written the code, these are all fallbacks used to work
around a clang bug that probably does not exist in versions of clang
that can compile musl. however, it's useful to have the generic
non-inline code anyway, as it eases the task of porting to new archs:
writing inline syscall code is now optional. this approach could also
help support compilers which don't understand inline asm or lack
support for the needed register constraints.
mips could not be unified because it has special fixup code for broken
layout of the kernel's struct stat.
the register constraints in the non-clang case were tested to work on
clang back to 3.2, and earlier versions of clang have known bugs that
preclude building musl.
there may be other reasons to prefer not to use inline syscalls, but
if so the function-call-based implementations should be added back in
a unified way for all archs.
calls to __aeabi_read_tp may be generated by the compiler to access
TLS on pre-v6 targets. previously, this function was hard-coded to
call the kuser helper, which would crash on kernels with kuser helper
removed.
to fix the problem most efficiently, the definition of __aeabi_read_tp
is moved so that it's an alias for the new __a_gettp. however, on v7+
targets, code to initialize the runtime choice of thread-pointer
loading code is not even compiled, meaning that defining
__aeabi_read_tp would have caused an immediate crash due to using the
default implementation of __a_gettp with a HCF instruction.
fortunately there is an elegant solution which reduces overall code
size: putting the native thread-pointer loading instruction in the
default code path for __a_gettp, so that separate default/native code
paths are not needed. this function should never be called before
__set_thread_area anyway, and if it is called early on pre-v6
hardware, the old behavior (crashing) is maintained.
ideally __aeabi_read_tp would not be called at all on v7+ targets
anyway -- in fact, prior to the overhaul, the same problem existed,
but it was never caught by users building for v7+ with kuser disabled.
however, it's possible for calls to __aeabi_read_tp to end up in a v7+
binary if some of the object files were built for pre-v7 targets, e.g.
in the case of static libraries that were built separately, so this
case needs to be handled.
previously, builds for pre-armv6 targets hard-coded use of the "kuser
helper" system for atomics and thread-pointer access, resulting in
binaries that fail to run (crash) on systems where this functionality
has been disabled (as a security/hardening measure) in the kernel.
additionally, builds for armv6 hard-coded an outdated/deprecated
memory barrier instruction which may require emulation (extremely
slow) on future models.
this overhaul replaces the behavior for all pre-armv7 builds (both of
the above cases) to perform runtime detection of the appropriate
mechanisms for barrier, atomic compare-and-swap, and thread pointer
access. detection is based on information provided by the kernel in
auxv: presence of the HWCAP_TLS bit for AT_HWCAP and the architecture
version encoded in AT_PLATFORM. direct use of the instructions is
preferred when possible, since probing for the existence of the kuser
helper page would be difficult and would incur runtime cost.
for builds targeting armv7 or later, the runtime detection code is not
compiled at all, and much more efficient versions of the non-cas
atomic operations are provided by using ldrex/strex directly rather
than wrapping cas.
the kernel syscall interface for or1k does not expect 64-bit arguments
to be aligned to "even" register boundaries. this incorrect alignment
broke truncate/ftruncate and as well as a few less-common syscalls.
these syscalls are new in linux v3.17 and present on all supported
archs except sh.
seccomp was added in commit 48dc92b9fc3926844257316e75ba11eb5c742b2c
it has operation, flags and pointer arguments (if flags==0 then it is
the same as prctl(PR_SET_SECCOMP,...)), the uapi header for flag
definitions is linux/seccomp.h
getrandom was added in commit c6e9d6f38894798696f23c8084ca7edbf16ee895
it provides an entropy source when open("/dev/urandom",..) would fail,
the uapi header for flags is linux/random.h
memfd_create was added in commit 9183df25fe7b194563db3fec6dc3202a5855839c
it allows anon mmap to have an fd, that can be shared, sealed and needs no
mount point, the uapi header for flags is linux/memfd.h
the C11 _Alignas keyword is not present in C++, and despite it being
in the reserved namespace and thus reasonable to support even in
non-C11 modes, compilers seem to fail to support it.
based on patch by Jens Gustedt.
mtx_t and cnd_t are defined in such a way that they are formally
"compatible types" with pthread_mutex_t and pthread_cond_t,
respectively, when accessed from a different translation unit. this
makes it possible to implement the C11 functions using the pthread
functions (which will dereference them with the pthread types) without
having to use the same types, which would necessitate either namespace
violations (exposing pthread type names in threads.h) or incompatible
changes to the C++ name mangling ABI for the pthread types.
for the rest of the types, things are much simpler; using identical
types is possible without any namespace considerations.
conceptually, a_spin needs to be at least a compiler barrier, so the
compiler will not optimize out loops (and the load on each iteration)
while spinning. it should also be a memory barrier, or the spinning
thread might keep spinning without noticing stores from other threads,
thus delaying for longer than it should.
ideally, an optimal a_spin implementation that avoids unnecessary
cache/memory contention should be chosen for each arch, but for now,
the easiest thing is to perform a useless a_cas on the calling
thread's stack.
unfortunately this needs to be able to vary by arch, because of a huge
mess GCC made: the GCC definition, which became the ABI, depends on
quirks in GCC's definition of __alignof__, which does not match the
formal alignment of the type.
GCC's __alignof__ unexpectedly exposes the an implementation detail,
its "preferred alignment" for the type, rather than the formal/ABI
alignment of the type, which it only actually uses in structures. on
most archs the two values are the same, but on some (at least i386)
the preferred alignment is greater than the ABI alignment.
I considered using _Alignas(8) unconditionally, but on at least one
arch (or1k), the alignment of max_align_t with GCC's definition is
only 4 (even the "preferred alignment" for these types is only 4).
when manipulating the robust list, the order of stores matters,
because the code may be asynchronously interrupted by a fatal signal
and the kernel will then access the robust list in what is essentially
an async-signal context.
previously, aliasing considerations made it seem unlikely that a
compiler could reorder the stores, but proving that they could not be
reordered incorrectly would have been extremely difficult. instead
I've opted to make all the pointers used as part of the robust list,
including those in the robust list head and in the individual mutexes,
volatile.
in addition, the format of the robust list has been changed to point
back to the head at the end, rather than ending with a null pointer.
this is to match the documented kernel robust list ABI. the null
pointer, which was previously used, only worked because faults during
access terminate the robust list processing.
this commit changes the names to match the kernel names, exposing
under the normal names the "old" versions which work with a smaller
termios structure compatible with the userspace structure, and
renaming the "new" versions with "2" on the end like the kernel has.
this fixes spurious warnings "Unsupported ioctl: cmd=0x802c542a" from
qemu-sh4 and should be more correct anyway, since our userspace
termios structure does not have meaningful information in the part
which the kernel would be interpreting as speeds with the new ioctl.
the a_cas_l, a_swap_l, a_swap_p, and a_store_l operations were
probably used a long time ago when only i386 and x86_64 were
supported. as other archs were added, support for them was
inconsistent, and they are obviously not in use at present. having
them around potentially confuses readers working on new ports, and the
type-punning hacks and inconsistent use of types in their definitions
is not a style I wish to perpetuate in the source tree, so removing
them seems appropriate.
while other usage I've seen only has the synco instruction after the
atomic operation, I cannot find any documentation indicating that this
is correct. certainly all stores before the atomic need to have been
synchronized before the atomic operation takes place.
due to what was essentially a copy and paste error, the changes made
in commit f61be1f875a2758509d6e9e2cf6f1d9603b28b65 caused syscalls
with 5 or 6 arguments (and syscalls with 2, 3, or 4 arguments when
compiled with clang compatibility) to negate the returned error code a
second time, breaking errno reporting.
the mips version of this structure on the kernel side wrongly has
32-bit type rather than 64-bit type. fortunately there is adjacent
padding to bring it up to 64 bits, and on little-endian, this allows
us to treat the adjacent kernel st_dev and st_pad0[0] as as single
64-bit dev_t. however, on big endian, such treatment results in the
upper and lower 32-bit parts of the dev_t value being swapped. for the
purpose of just comparing st_dev values this did not break anything,
but it precluded actually processing the device numbers as major/minor
values.
since the broken kernel behavior that needs to be worked around is
isolated to one arch, I put the workarounds in syscall_arch.h rather
than adding a stat fixup path in the common code. on little endian
mips, the added code optimizes out completely.
the changes necessary were incompatible with the way the __asm_syscall
macro was factored so I just removed it and flattened the individual
__syscallN functions. this arguably makes the code easier to read and
understand, anyway.
at the very least, a compiler barrier is required no matter what, and
that was missing. current or1k implementations have strong ordering,
but this is not guaranteed as part of the ISA, so some sort of
synchronizing operation is necessary.
in principle we should use l.msync, but due to misinterpretation of
the spec, it was wrongly treated as an optional instruction and is not
supported by some implementations. if future kernels trap it and treat
it as a nop (rather than illegal instruction) when the
hardware/emulator does not support it, we could consider using it.
in the absence of l.msync support, the l.lwa/l.swa instructions, which
are specified to have a built-in l.msync, need to be used. the easiest
way to use them to implement atomic store is to perform an atomic swap
and throw away the result. using compare-and-swap would be lighter,
and would probably be sufficient for all actual usage cases, but
checking this is difficult and error-prone:
with store implemented in terms of swap, it's guaranteed that, when
another atomic operation is performed at the same time as the store,
either the result of the store followed by the other operation, or
just the store (clobbering the other operation's result) is seen. if
store were implemented in terms of cas, there are cases where this
invariant would fail to hold, and we would need detailed rules for the
situations in which the store operation is well-defined.
as far as I can tell, microblaze is strongly ordered, but this does
not seem to be well-documented and the assumption may need revisiting.
even with strong ordering, however, a volatile C assignment is not
sufficient to implement atomic store, since it does not preclude
reordering by the compiler with respect to non-volatile stores and
loads.
simply flanking a C store with empty volatile asm blocks with memory
clobbers would achieve the desired result, but is likely to result in
worse code generation, since the address and value for the store may
need to be spilled. actually writing the store in asm, so that there's
only one asm block, should give optimal code generation while
satisfying the requirement for having a compiler barrier.
previously I had wrongly assumed the ll/sc instructions also provided
memory synchronization; apparently they do not. this commit adds sync
instructions before and after each atomic operation and changes the
atomic store to simply use sync before and after a plain store, rather
than a useless compare-and-swap.
despite lacking the semantic content that the asm accesses the
pointed-to object rather than just using its address as a value, the
mips asm was not actually broken. the asm blocks were declared
volatile, meaning that the compiler must treat them as having unknown
side effects.
however changing the asm to use memory constraints is desirable not
just from a semantic correctness and consistency standpoint, but also
produces better code. the compiler is able to use base/offset
addressing expressions for the atomic object's address rather than
having to load the address into a single register. this improves
access to global locks in static libc, and access to non-zero-offset
atomic fields in synchronization primitives, etc.
due to a mistake in my testing procedure, the changes in the previous
commit were not correctly tested and wrongly assumed to be valid. the
lwarx and stwcx. instructions do not accept general ppc memory address
expressions and thus the argument associated with the memory
constraint cannot be used directly.
instead, the memory constraint can be left as an argument that the asm
does not actually use, and the address can be provided in a separate
register constraint.
the register constraint for the address to be accessed did not convey
that the asm can access the pointed-to object. as far as the compiler
could tell, the result of the asm was just a pure function of the
address and the values passed in, and thus the asm could be hoisted
out of loops or omitted entirely if the result was not used.
the erroneous definition was missed because with works with qemu
user-level emulation, which also has the wrong definition. the actual
kernel uses the asm-generic generic definition.
With the exception of a fenv implementation, the port is fully featured.
The port has been tested in or1ksim, the golden reference functional
simulator for OpenRISC 1000.
It passes all libc-test tests (except the math tests that
requires a fenv implementation).
The port assumes an or1k implementation that has support for
atomic instructions (l.lwa/l.swa).
Although it passes all the libc-test tests, the port is still
in an experimental state, and has yet experienced very little
'real-world' use.
this issue caused the address of functions in shared libraries to
resolve to their PLT thunks in the main program rather than their
correct addresses. it was observed causing crashes, though the
mechanism of the crash was not thoroughly investigated. since the
issue is very subtle, it calls for some explanation:
on all well-behaved archs, GOT entries that belong to the PLT use a
special relocation type, typically called JMP_SLOT, so that the
dynamic linker can avoid having the jump destinations for the PLT
resolve to PLT thunks themselves (they also provide a definition for
the symbol, which must be used whenever the address of the function is
taken so that all DSOs see the same address).
however, the traditional mips PIC ABI lacked such a JMP_SLOT
relocation type, presumably because, due to the way PIC works, the
address of the PLT thunk was never needed and could always be ignored.
prior to commit adf94c19666e687a728bbf398f9a88ea4ea19996, the mips
version of reloc.h contained a hack that caused all symbol lookups to
be treated like JMP_SLOT, inhibiting undefined symbols from ever being
used to resolve symbolic relocations. this hack goes all the way back
to commit babf820180368f00742ec65b2050a82380d7c542, when the mips
dynamic linker was first made usable.
during the recent refactoring to eliminate arch-specific relocation
processing (commit adf94c19666e687a728bbf398f9a88ea4ea19996), this
hack was overlooked and no equivalent functionality was provided in
the new code.
fixing the problem is not as simple as adding back an equivalent hack,
since there is now also a "non-PIC ABI" that can be used for the main
executable, which actually does use a PLT. the closest thing to
official documentation I could find for this ABI is nonpic.txt,
attached to Message-ID: 20080701202236.GA1534@caradoc.them.org, which
can be found in the gcc mailing list archives and elsewhere. per this
document, undefined symbols corresponding to PLT thunks have the
STO_MIPS_PLT bit set in the symbol's st_other field. thus, I have
added an arch-specific rule for mips, applied at the find_sym level
rather than the relocation level, to reject undefined symbols with the
STO_MIPS_PLT bit clear.
the previous hack of treating all mips relocations as JMP_SLOT-like,
rather than rejecting the unwanted symbols in find_sym, probably also
caused dlsym to wrongly return PLT thunks in place of the correct
address of a function under at least some conditions. this should now
be fixed, at least for global-scope symbol lookups.
this was one of the main instances of ugly code duplication: all archs
use basically the same types of relocations, but roughly equivalent
logic was duplicated for each arch to account for the different naming
and numbering of relocation types and variation in whether REL or RELA
records are used.
as an added bonus, both REL and RELA are now supported on all archs,
regardless of which is used by the standard toolchain.
processing of R_PPC_TPREL32 was ignoring the addend provided by the
RELA-style relocation and instead using the inline value as the
addend. this presumably broke dynamic-linked access to initial TLS in
cases where the addend was nonzero.
the following issues are fixed:
- R_SH_REL32 was adding the load address of the module being relocated
to the result. this seems to have been a mistake in the original
port, since it does not match other dynamic linker implementations
and since adding a difference between two addresses (the symbol
value and the relocation address) to a load address does not make
sense.
- R_SH_TLS_DTPMOD32 was wrongly accepting an inline addend (i.e. using
+= rather than = on *reloc_addr) which makes no sense; addition is
not an operation that's defined on module ids.
- R_SH_TLS_DTPOFF32 and R_SH_TLS_TPOFF32 were wrongly using inline
addends rather than the RELA-provided addends.
in addition, handling of R_SH_GLOB_DAT, R_SH_JMP_SLOT, and R_SH_DIR32
are merged to all honor the addend. the first two should not need it
for correct usage generated by toolchains, but other dynamic linkers
allow addends here, and it simplifies the code anyway.
these issues were spotted while reviewing the code for the purpose of
refactoring this part of the dynamic linker. no testing was performed.
the immediate motivation is supporting TLSDESC relocations which
require allocation and thus may fail (unless we pre-allocate), but
this mechanism should also be used for throwing an error on
unsupported or invalid relocation types, and perhaps in certain cases,
for reporting when a relocation is not satisfiable.
linux 3.14 introduced sched_getattr and sched_setattr syscalls in
commit d50dde5a10f305253cbc3855307f608f8a3c5f73
and the related SCHED_DEADLINE scheduling policy in
commit aab03e05e8f7e26f51dee792beddcb5cca9215a5
but struct sched_attr "extended scheduling parameters data structure"
is not yet exported to userspace (necessary for using the syscalls)
so related uapi definitions are not added yet.
On 32 bit mips the kernel uses -1UL/2 to mark RLIM_INFINITY (and
this is the definition in the userspace api), but since it is in
the middle of the valid range of limits and limits are often
compared with relational operators, various kernel side logic is
broken if larger than -1UL/2 limits are used. So we truncate the
limits to -1UL/2 in get/setrlimit and prlimit.
Even if the kernel side logic consistently treated -1UL/2 as greater
than any other limit value, there wouldn't be any clean workaround
that allowed using large limits:
* using -1UL/2 as RLIM_INFINITY in userspace would mean different
infinity value for get/setrlimt and prlimit (where infinity is always
-1ULL) and userspace logic could break easily (just like the kernel
is broken now) and more special case code would be needed for mips.
* translating -1UL/2 kernel side value to -1ULL in userspace would
mean that -1UL/2 limit cannot be set (eg. -1UL/2+1 had to be passed
to the kernel instead).
armv7/thumb2 provides a way to do atomics in thumb mode, but for armv6
we need a call to arm mode.
this commit is based on a patch by Stephen Thomas which fixed the
armv7 cases but not the armv6 ones.
all of this should be revisited if/when runtime selection of thread
pointer access and atomics are added.
the vdso symbol lookup code is based on the original 2011 patch by
Nicholas J. Kain, with some streamlining, pointer arithmetic fixes,
and one symbol version matching fix.
on the consumer side (clock_gettime), per-arch macros for the
particular symbol name and version to lookup are added in
syscall_arch.h, and no vdso code is pulled in on archs which do not
define these macros. at this time, vdso is enabled only on x86_64.
the vdso support at the dynamic linker level is no longer useful to
libc, but is left in place for the sake of debuggers (which may need
the vdso in the link map to find its functions) and possibly use with
dlsym.
The mips arch is special in that it uses different RLIMIT_
numbers than other archs, so allow bits/resource.h to override
the default RLIMIT_ numbers (empty on all archs except mips).
Reported by orc.
it will be needed to implement some things in sysconf, and the syscall
can't easily be used directly because the x32 syscall uses the wrong
structure layout. the l (uncreative, for "linux") prefix is used since
the symbol name __sysinfo is already taken for AT_SYSINFO from the aux
vector.
the way the x32 override of this function works is also changed to be
simpler and avoid the useless jump instruction.
aside from potentially offering better performance, this change is
needed since the old coprocessor-based approach to barriers is
deprecated in arm v7, and some compilers/assemblers issue errors when
using the deprecated instruction for v7 targets.
the "m" constraint could give a memory reference with an offset that's
not compatible with ldrex/strex, so the arm-specific "Q" constraint is
needed instead.
this is perhaps not the optimal implementation; a_cas still compiles
to nested loops due to the different interface contracts of the kuser
helper cas function (whose contract this patch implements) and the
a_cas function (whose contract mimics the x86 cmpxchg). fixing this
may be possible, but it's more complicated and thus deferred until a
later time.
aside from improving performance and code size, this patch also
provides a means of producing binaries which can run on hardened
kernels where the kuser helpers have been disabled. however, at
present this requires producing binaries for armv6k or later, which
will not run on older cpus. a real solution to the problem of kernels
that omit the kuser helpers would be runtime detection, so that
universal binaries which run on all arm cpu models can also be
compatible with all kernel hardening profiles. robust detection
however is a much harder problem, and will be addressed at a later
time.
the kernel entry point for syscalls on microblaze nominally saves and
restores all registers, and testing on qemu always worked since qemu
behaves this way too. however, the real kernel treats r3:r4 as a
potential 64-bit return value from the syscall function, and copies
both over top of the saved registers before returning to userspace.
thus, we need to treat r4 as always-clobbered.
the excess space was unused and unintentional. this change does not
affect the ABI between applications and libc. while it does
theoretically affect linkage between third-party translation units
using jmp_buf as part of a structure, we've already changed jmp_buf at
least once on all archs, and problems were never observed, likely
because such usage would be very unusual. in any case it's best to get
things right now rather than making changes sometime during the 1.0.x
series or later.
Applications ended up with copy relocations for this array, which
resulted in libc's references to this array pointing to the
application's copy. The dynamic linker, however, can require this array
before the application is relocated, and therefore before the
application's copy of this array is initialized. This resulted in
garbage being loaded into FPSCR before executing main, which violated
the ABI.
We fix this by putting the array in crt1 and making the libc copy
private. This prevents libc's reference to the array from pointing to
an uninitialized copy in the application.
The mips statfs struct layout is different than on other archs, so the
statfs, fstatfs, statvfs and fstatvfs APIs were broken on mips.
Now the ordering is fixed, the types are kept consistent with other archs.
these have been wrong for a long time and were never detected or
corrected. powerpc needs some gratuitous extra padding/reserved slots
in ipc_perm, big-endian ordering for the padding of time_t slots that
was intended by the kernel folks to allow a transition to 64-bit
time_t, and some minor gratuitous reordering of struct members.
the definition was found to be incorrect at least for powerpc, and
fixing this cleanly requires making the definition arch-specific. this
will allow cleaning up the definition for other archs to make it more
specific, and reversing some of the ugliness (time_t hacks) introduced
with the x32 port.
this first commit simply copies the existing definition to each arch
without any changes. this is intentional, to make it easier to review
changes made on a per-arch basis.
the kernel uses long longs in the struct, but the documentation
says they're long. so we need to fixup the mismatch between the
userspace and kernelspace structs.
since the struct offers a mem_unit member, we can avoid truncation
by adjusting that value.
linux, gcc, etc. all use "sh" as the name for the superh arch. there
was already some inconsistency internally in musl: the dynamic linker
was searching for "ld-musl-sh.path" as its path file despite its own
name being "ld-musl-superh.so.1". there was some sentiment in both
directions as to how to resolve the inconsistency, but overall "sh"
was favored.
Userspace emulated floating-point (gcc -msoft-float) is not compatible
with the default mips abi (assumes an FPU or in kernel emulation of it).
Soft vs hard float abi should not be mixed, __mips_soft_float is checked
in musl's configure script and there is no runtime check. The -sf subarch
does not save/restore floating-point registers in setjmp/longjmp and only
provides dummy fenv implementation.
the reordering of headers caused some risc archs to not see
the __syscall declaration anymore.
this caused build errors on mips with any compiler,
and on arm and microblaze with clang.
we now declare it locally just like the powerpc port does.
it's legal to call the __syscall functions with more arguments than
necessary, and the __syscall_cp cancel dummy impl. does just that.
thus we must insert the switch for all possible syscalls numbers
into all of the syscallN inline functions.
- the nanosleep fixup "fixed" the second timespec* argument erroneusly.
- the futex fixup was missing the check for FUTEX_WAIT.
- general cleanup using a macro.
the operand size is unnecessary, since the assembler knows it from the
destination register size. removing the suffix makes it so the same
code should work for x32.
the fix should be complete on archs that use the generic definitions
(i386, arm, x86_64, microblaze), but mips and powerpc have not been
checked thoroughly and may need more fixes.
previously these macros wrongly had type double rather than long
double. I see no way an application could detect the error in C99, but
C11's _Generic can trivially detect it.
at the same time, even though these archs do not have excess
precision, the number of decimal places used to represent these
constants has been increased to 21 to be consistent with the decimal
representations used for the DBL_* macros.
atomic store was lacking a barrier, which was fine for legacy arm with
no real smp and kernel-emulated cas, but unsuitable for more modern
systems. the kernel provides another "kuser" function, at 0xffff0fa0,
which could be used for the barrier, but using that would drop support
for kernels 2.6.12 through 2.6.14 unless an extra conditional were
added to check for barrier availability. just using the barrier in the
kernel cas is easier, and, based on my reading of the assembly code in
the kernel, does not appear to be significantly slower.
at the same time, other atomic operations are adapted to call the
kernel cas function directly rather than using a_cas; due to small
differences in their interface contracts, this makes the generated
code much simpler.
PAGE_SIZE was hardcoded to 4096, which is historically what most
systems use, but on several archs it is a kernel config parameter,
user space can only know it at execution time from the aux vector.
PAGE_SIZE and PAGESIZE are not defined on archs where page size is
a runtime parameter, applications should use sysconf(_SC_PAGE_SIZE)
to query it. Internally libc code defines PAGE_SIZE to libc.page_size,
which is set to aux[AT_PAGESZ] in __init_libc and early in __dynlink
as well. (Note that libc.page_size can be accessed without GOT, ie.
before relocations are done)
Some fpathconf settings are hardcoded to 4096, these should be actually
queried from the filesystem using statfs.
msg.h was wrong for big-endian (wrong endiannness padding).
shm.h was just plain wrong (mips is not supposed to have padding).
both changes were tested using libc-test on qemu-system-mips.
it turns out that __SOFTFP__ does not indicate the ABI in use but
rather that fpu instructions are not to be used at all. this is
specified in ARM's documentation so I'm unclear on how I previously
got the wrong idea. unfortunately, this resulted in the 0.9.12 release
producing a dynamic linker with the wrong name. fortunately, there do
not yet seem to be any public toolchain builds using the wrong name.
the __ARM_PCS_VFP macro does not seem to be official from ARM, and in
fact it was missing from the very earliest gcc versions (around 4.5.x)
that added -mfloat-abi=hard. it would be possible on such versions to
perform some ugly linker-based tests instead in hopes that the linker
will reject ABI-mismatching object files, if there is demand for
supporting such versions. I would probably prefer to document which
versions are broken and warn users to manually add -D__ARM_PCS_VFP if
using such a version.
there's definitely an argument to be made that the fenv macros should
be exposed even in -mfloat-abi=softfp mode. for now, I have chosen not
to expose them in this case, since the math library will not
necessarily have the capability to raise exceptions (it depends on the
CFLAGS used to compile it), and since exceptions are officially
excluded from the ARM EABI, which the plain "arm" arch aims to
follow.
without these, calls may be resolved incorrectly if the calling code
has been compiled to thumb instead of arm. it's not clear to me at
this point whether crt_arch.h is even working if crt1.c is built as
thumb; this needs testing. but the _init and _fini issues were known
to cause crashes in static-linked apps when libc was built as thumb,
and this commit should fix that issue.
a mips signal mask contains 128 bits, enough for signals 1 through
128. however, the exit status obtained from the wait-family functions
only has room for values up to 127. reportedly signal 128 was causing
kernelspace bugs, so it was removed from the kernel recently; even
without that issue, however, it was impossible to support it correctly
in userspace.
at the same time, the bug was masked on musl by SIGRTMAX incorrectly
yielding 64 on mips, rather than the "correct" value of 128. now that
the _NSIG issue is fixed, SIGRTMAX can be fixed at the same time,
exposing the full range of signals for application use.
note that the (nonstandardized) libc _NSIG value is actually one
greater than the max signal number, and also one greater than the
kernel headers' idea of _NSIG. this is the reason for the discrepency
with the recent kernel changes. since reducing _NSIG by one brought it
down from 129 to 128, rather than from 128 to 127, _NSIG/8, used
widely in the musl sources, is unchanged.
the only immediate effect of this commit is enabling PIE support on
some archs that did not previously have any Scrt1.s, since the
existing asm files for crt1 override this C code. so some of the
crt_arch.h files committed are only there for the sake of documenting
what their archs "would do" if they used the new C-based crt1.
the expectation is that new archs should use this new system rather
than using heavy asm for crt1. aside from being easier and less
error-prone, it also ensures that PIE support is available immediately
(since Scrt1.o is generated from the same C source, using -fPIC)
rather than having to be added as an afterthought in the porting
process.
this is necessary to meet the C++ ABI target. alternatives were
considered to avoid the size increase for non-sig jmp_buf objects, but
they seemed to have worse properties. moreover, the relative size
increase is only extreme on x86[_64]; one way of interpreting this is
that, if the size increase from this patch makes jmp_buf use too much
memory, then the program was already using too much memory when built
for non-x86 archs.
rather than moving nlink_t back to the arch-specific file, I've added
a macro _Reg defined to the canonical type for register-size values on
the arch. this is not the same as _Addr for (not-yet-supported)
32-on-64 pseudo-archs like x32 and mips n32, so a new macro was
needed.
since the old, poorly-thought-out musl approach to init/fini arrays on
ARM (when it was the only arch that needed them) was to put the code
in crti/crtn and have the legacy _init/_fini code run the arrays,
adding proper init/fini array support caused the arrays to get
processed twice on ARM. I'm not sure skipping legacy init/fini
processing is the best solution to the problem, but it works, and it
shouldn't break anything since the legacy init/fini system was never
used for ARM EABI.
aside from the obvious C++ ABI purpose for this change, it also brings
musl into alignment with the compiler's idea of the definition of
wint_t (use in -Wformat), and makes the situation less awkward on ARM,
where wchar_t is unsigned.
internal code using wint_t and WEOF was checked against this change,
and while a few cases of storing WEOF into wchar_t were found, they
all seem to operate properly with the natural conversion from unsigned
to signed.
the arch-specific bits/alltypes.h.sh has been replaced with a generic
alltypes.h.in and minimal arch-specific bits/alltypes.h.in.
this commit is intended to have no functional changes except:
- exposing additional symbols that POSIX allows but does not require
- changing the C++ name mangling for some types
- fixing the signedness of blksize_t on powerpc (POSIX requires signed)
- fixing the limit macros for sig_atomic_t on x86_64
- making dev_t an unsigned type (ABI matching goal, and more logical)
in addition, some types that were wrongly defined with long on 32-bit
archs were changed to int, and vice versa; this change is
non-functional except for the possibility of making pointer types
mismatch, and only affects programs that were using them incorrectly,
and only at build-time, not runtime.
the following changes were made in the interest of moving
non-arch-specific types out of the alltypes system and into the
headers they're associated with, and also will tend to improve
application compatibility:
- netdb.h now includes netinet/in.h (for socklen_t and uint32_t)
- netinet/in.h now includes sys/socket.h and inttypes.h
- sys/resource.h now includes sys/time.h (for struct timeval)
- sys/wait.h now includes signal.h (for siginfo_t)
- langinfo.h now includes nl_types.h (for nl_item)
for the types in stdint.h:
- types which are of no interest to other headers were moved out of
the alltypes system.
- fast types for 8- and 64-bit are hard-coded (at least for now); only
the 16- and 32-bit ones have reason to vary by arch.
and the following types have been changed for C++ ABI purposes;
- mbstate_t now has a struct tag, __mbstate_t
- FILE's struct tag has been changed to _IO_FILE
- DIR's struct tag has been changed to __dirstream
- locale_t's struct tag has been changed to __locale_struct
- pthread_t is defined as unsigned long in C++ mode only
- fpos_t now has a struct tag, _G_fpos64_t
- fsid_t's struct tag has been changed to __fsid_t
- idtype_t has been made an enum type (also required by POSIX)
- nl_catd has been changed from long to void *
- siginfo_t's struct tag has been removed
- sigset_t's has been given a struct tag, __sigset_t
- stack_t has been given a struct tag, sigaltstack
- suseconds_t has been changed to long on 32-bit archs
- [u]intptr_t have been changed from long to int rank on 32-bit archs
- dev_t has been made unsigned
summary of tests that have been performed against these changes:
- nsz's libc-test (diff -u before and after)
- C++ ABI check symbol dump (diff -u before, after, glibc)
- grepped for __NEED, made sure types needed are still in alltypes
- built gcc 3.4.6
this change is both to fix one of the remaining type (and thus C++
ABI) mismatches with glibc/LSB and to allow use of the full range of
uid and gid values, if so desired.
passwd/group access functions were not prepared to deal with unsigned
values, so they too have been fixed with this commit.
prior to this change, using a non-default syslibdir was impractical on
systems where the ordinary library paths contain musl-incompatible
library files. the file containing search paths was always taken from
/etc, which would either correspond to a system-wide musl
installation, or fail to exist at all, resulting in searching of the
default library path.
the new search strategy is safe even for suid programs because the
pathname used comes from the PT_INTERP header of the program being
run, rather than any external input.
as part of this change, I have also begun differentiating the names of
arch variants that differ by endianness or floating point calling
convention. the corresponding changes in the build system and and gcc
wrapper script (to use an alternate dynamic linker name) for these
configurations have not yet been made.
despite declaring functions that take arguments of type va_list, these
headers are not permitted by the c standard to expose the definition
of va_list, so an alias for the type must be used. the name
__isoc_va_list was chosen to convey that the purpose of this alternate
name is for iso c conformance, and to avoid the multitude of names
which gcc mangles with its hideous "fixincludes" monstrosity, leading
to serious header breakage if these "fixes" are run.
previously we were using an unsigned type on 32-bit systems so that
subtraction would be well-defined when it wrapped, but since wrapping
is non-conforming anyway (when clock() overflows, it has to return -1)
the only use of unsigned would be to buy a little bit more time before
overflow. this does not seem worth having the type vary per-arch
(which leads to more arch-specific bugs) or disagree with the ABI musl
(mostly) follows.
there was some question as to how many decimal places to use, since
one decimal place is always sufficient to identify the smallest
denormal uniquely. for now, I'm following the example in the C
standard which is consistent with the other min/max macros we already
had in place.
the preprocessor can reliably determine the signedness of wchar_t.
L'\0' is used for 0 in the expressions so that, if the underlying type
of wchar_t is long rather than int, the promoted type of the
expression will match the type of wchar_t.
this type was removed back in 5243e5f160 ,
because it was removed from the XSI specs.
however some apps use it.
since it's in the POSIX reserved namespace, we can expose it
unconditionally.
the issue at hand is that many syscalls require as an argument the
kernel-ABI size of sigset_t, intended to allow the kernel to switch to
a larger sigset_t in the future. previously, each arch was defining
this size in syscall_arch.h, which was redundant with the definition
of _NSIG in bits/signal.h. as it's used in some not-quite-portable
application code as well, _NSIG is much more likely to be recognized
and understood immediately by someone reading the code, and it's also
shorter and less cluttered.
note that _NSIG is actually 65/129, not 64/128, but the division takes
care of throwing away the off-by-one part.
wctype_t was incorrectly "int" rather than "long" on x86_64. not only
is this an ABI incompatibility; it's also a major design flaw if we
ever wanted wctype_t to be implemented as a pointer, which would be
necessary if locales support custom character classes, since int is
too small to store a converted pointer. this commit fixes wctype_t to
be unsigned long on all archs, matching the LSB ABI; this change does
not matter for C code, but for C++ it affects mangling.
the same issue applied to wctrans_t. glibc/LSB defines this type as
const __int32_t *, but since no such definition is visible, I've just
expanded the definition, int, everywhere.
it would be nice if these types (which don't vary by arch) could be in
wctype.h, but the OB XSI requirement in POSIX that wchar.h expose some
types and functions from wctype.h precludes doing so. glibc works
around this with some hideous hacks, but trying to duplicate that
would go against the intent of musl's headers.
arm eabi requires this symbol for static C++ dtors.
usually it is provided by libstdc++, but when a C++ program
doesn't use the std lib (free-standing), the libc has to provide
it.
this was encountered while building transmission, which
depends on such a C++ library (libutp).
this function is nearly identical to __cxa_atexit, but it has the
order of argumens swapped for "performance reasons".
see page 25 of
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0043d/IHI0043D_rtabi.pdf
there are other aeabi specific C++ support functions missing, but
it is not clear yet that GCC makes use of them so we omit them for
the moment.
they were accidentally exposed under just baseline POSIX, which is a
big namespace pollution issue. thankfully glibc only exposes them
under _GNU_SOURCE, not under any of its other options, so omitting
the pollution in the default _BSD_SOURCE profile does not hurt
application compatibility at all.
these structures are purely for use by trace/debug tools and tools
working with core files. the definition of fpregset_t, which was
previously here, has been removed because it was wrong; fpregset_t
should be the type used in mcontext_t, not the type used in
ptrace/core stuff.
aside from microblaze, these should be roughly correct for all archs
now. some misc junk macros and typedefs are missing, which should
probably be added for max compatibility with trace/debug tools.
it should now really match the kernel. some of the removed padding
corresponded to the difference between user and kernel sigset_t. the
space at the end was redundant with the uc_mcontext member and seems
to have been added as a result of misunderstanding glibc's definition
versus the kernel's.
with these changes, the members/types of mcontext_t and related stuff
should closely match the glibc definitions. unlike glibc, however, the
definitions here avoid using typedefs as much as possible and work
directly with the underlying types, to minimize namespace pollution
from signal.h in the default (_BSD_SOURCE) profile.
this is a first step in improving compatibility with applications
which poke at context/register information -- mainly debuggers, trace
utilities, etc. additional definitions in ucontext.h and other headers
may be needed later.
if feature test macros are used to request a conforming namespace,
mcontext_t is replaced with an opaque structure of the equivalent size
and alignment; conforming programs cannot examine its contents anyway.
unlike the previous definition, NSIG/_NSIG is supposed to be one more
than the highest signal number. adding this will allow simplifying
libc-internal code that makes signal-related syscalls, which can be
done as a later step. some apps might use it too; while this usage is
questionable, it's at least not insane.
also handle the non-GNUC case where alignment attribute is not available
by simply omitting it. this will not cause problems except for
inclusion of mcontex_t/ucontext_t in application-defined structures,
since the natural alignment of the uc_mcontext member relative to the
start of ucontext_t is already correct. and shame on whoever designed
this for making it impossible to satisfy the ABI requirements without
GNUC extensions.
apparently some other archs have sys/io.h and should not break just
because they don't have the x86 port io functions. provide a blank
bits/io.h everywhere for now.
based on proposal by Isaac Dunham. nonexistance of bits/io.h will
cause inclusion of sys/io.h to produce an error on archs that are not
supposed to have it. this is probably the desired behavior, but the
error message may be a bit unusual.
put some macros that do not differ between architectures in the
main header and remove from bits.
restructure mips header so it has the same structure as the others.
incomplete but at least partly working. requires all files to be
compiled in the new "secure" plt model, not the old one that put plt
code in the data segment. TLS is untested but may work. invoking the
dynamic linker explicitly to load a program does not yet handle argv
correctly.
although a number is reserved for it, this option is not implemented
on Linux and does not work. defining it causes some applications to
use it, and subsequently break due to its failure.
despite documentation that makes it sound a lot different, the only
ABI-constraint difference between TLS variants II and I seems to be
that variant II stores the initial TLS segment immediately below the
thread pointer (i.e. the thread pointer points to the end of it) and
variant I stores the initial TLS segment above the thread pointer,
requiring the thread descriptor to be stored below. the actual value
stored in the thread pointer register also tends to have per-arch
random offsets applied to it for silly micro-optimization purposes.
with these changes applied, TLS should be basically working on all
supported archs except microblaze. I'm still working on getting the
necessary information and a working toolchain that can build TLS
binaries for microblaze, but in theory, static-linked programs with
TLS and dynamic-linked programs where only the main executable uses
TLS should already work on microblaze.
alignment constraints have not yet been heavily tested, so it's
possible that this code does not always align TLS segments correctly
on archs that need TLS variant I.
this is actually a rather subtle issue: do arrays decay to pointers
when used as inline asm args? gcc says yes, but currently pcc says no.
hopefully this discrepency in pcc will be fixed, but since the
behavior is not clearly defined anywhere I can find, I'm using an
explicit operation to cause the decay to occur.
this doubles the performance of the fastest syscalls on the atom I
tested it on; improvement is reportedly much more dramatic on
worst-case cpus. cannot be used for cancellable syscalls.
currently, only i386 is tested. x86_64 and arm should probably work.
the necessary relocation types for mips and microblaze have not been
added because I don't understand how they're supposed to work, and I'm
not even sure if it's defined yet on microblaze. I may be able to
reverse engineer the requirements out of gcc/binutils output.
based on initial work by rdp, with heavy modifications. some features
including threads are untested because qemu app-level emulation seems
to be broken and I do not have a proper system image for testing.
if same register is used for input/output, the compiler must be told.
otherwise is generates random junk code that clobbers the result. in
pure syscall-wrapper functions, nothing went wrong, but in more
complex functions where register allocation is non-trivial, things
broke badly.
I'm not 100% sure that Linux's O_PATH meets the POSIX requirements for
O_SEARCH, but it seems very close if not perfect. and old kernels
ignore it, so O_SEARCH will still work as desired as long as the
caller has read permissions to the directory.
by using the "ir" constraint (immediate or register) and the carefully
constructed instruction addu $2,$0,%2 which can take either an
immediate or a register for %2, the new inline asm admits maximal
optimization with no register spillage to the stack when the compiler
successfully performs constant propagration, but still works by
allocating a register when the syscall number cannot be recognized as
a constant. in the case of syscalls with 0-3 arguments it barely
matters, but for 4-argument syscalls, using an immediate for the
syscall number avoids creating a stack frame for the syscall wrapper
function.
all past and current kernel versions have done so, but there seems to
be no reason it's necessary and the sentiment from everyone I've asked
has been that we should not rely on it. instead, use r7 (an argument
register) which will necessarily be preserved upon syscall restart.
however this only works for 0-3 argument syscalls, and we have to
resort to the function call for 4-argument syscalls.
this drastically reduces the size of some functions which are purely
syscall wrappers.
disabled for clang due to known bugs satisfying register constraints.
now public syscall.h only exposes __NR_* and SYS_* constants and the
variadic syscall function. no macros or inline functions, no
__syscall_ret or other internal details, no 16-/32-bit legacy syscall
renaming, etc. this logic has all been moved to src/internal/syscall.h
with the arch-specific parts in arch/$(ARCH)/syscall_arch.h, and the
amount of arch-specific stuff has been reduced to a minimum.
changes still need to be reviewed/double-checked. minimal testing on
i386 and mips has already been performed.
clang does not presently support the "v" constraint we want to use to
get the result from $3, and trying to use register...__asm__("$3") to
do the same invokes serious compiler bugs. so for now, i'm working
around the issue with an extra temp register and putting $3 in the
clobber list instead of using it as output. when the bugs in clang are
fixed, this issue should be revisited to generate smaller/faster code
like what gcc gets.
while musl itself requires a c99 compiler, some applications insist on
being compiled with c89 compilers, and use of "inline" in the headers
was breaking them. much of this had been avoided already by just
skipping the inline keyword in pre-c99 compilers or modes, but this
new unified solution is cleaner and may/should result in better code
generation in the default gcc configuration.
this is needed to match the underlying "ABI" standards. it's not
really an ABI issue since the binary representations are the same, but
having the wrong type can lead to errors when the type arising from a
difference-of-pointers expression does not match the defined type of
ptrdiff_t. most of the problems affect C++, not C.
not heavily tested, but the basics are working. the basic concept is
that the dynamic linker entry point code invokes a pure-PIC (no global
accesses) C function in reloc.h to perform the early GOT relocations
needed to make the dynamic linker itself functional, then invokes
__dynlink like on other archs. since mips uses some ugly arch-specific
hacks to optimize relocating the GOT (rather than just using the
normal DT_REL[A] tables like on other archs), the dynamic linker has
been modified slightly to support calling arch-specific relocation
code in reloc.h.
most of the actual mips-specific behavior was developed by reading the
output of readelf on libc.so and simple executable files. i could not
find good reference information on which relocation types need to be
supported or their semantics, so it's possible that some legitimate
usage cases will not work yet.
also fix the alignment of jmp_buf to meet the abi. linux always
emulates fpu on mips if it's not present, so enabling this code
unconditionally is "safe" but may be slow. in the long term it may be
preferable to find a way to disable it on soft float builds.
the fields in the mcontext_t are long long (for no good reason) even
on 32-bit mips, so the offset of the instruction pointer (as a word)
varies depending on endianness.
the kernel wrongly expects the cmsg length field to be size_t instead
of socklen_t. in order to work around the issue, we have to impose a
length limit and copy to a local buffer. the length limit should be
more than sufficient for any real-world use; these headers are only
used for passing file descriptors and permissions between processes
over unix sockets.
basically, this version of the code was obtained by starting with
rdp's work from his ellcc source tree, adapting it to musl's build
system and coding style, auditing the bits headers for discrepencies
with kernel definitions or glibc/LSB ABI or large file issues, fixing
up incompatibility with the old binutils from aboriginal linux, and
adding some new special cases to deal with the oddities of sigaction
and pipe syscall interfaces on mips.
at present, minimal test programs work, but some interfaces are broken
or missing. threaded programs probably will not link.
on arm, the location of the saved-signal-mask flag and mask were off
by one between sigsetjmp and siglongjmp, causing incorrect behavior
restoring the signal mask. this is because the siglongjmp code assumed
an extra slot was in the non-sig jmp_buf for the flag, but arm did not
have this. now, the extra slot is removed for all archs since it was
useless.
also, arm eabi requires jmp_buf to have 8-byte alignment. we achieve
that using long long as the type rather than with non-portable gcc
attribute tags.
on old kernels, there's no way to detect errors; we must assume
negative syscall return values are pgrp ids. but if the F_GETOWN_EX
fcntl works, we can get a reliable answer.
this is actually rather ugly, and would get even uglier if we ever
want to support further feature test macros. at some point i may
factor the bits headers into separate files for C base, POSIX base,
and nonstandard extensions (the only distinctions that seem to matter
now) and then the logic for which to include can go in the main header
rather than being duplicated for each arch. the downside of this is
that it would result in more files having to be opened during
compilation, so as long as the ugliness does not grow, i'm inclined to
leave it alone for now.