the x32 syscall interfaces treat timespec's tv_nsec member as 64-bit
despite the API type being long and long being 32-bit in the ABI. this
is no problem for syscalls that store timespecs to userspace as
results, but caused uninitialized padding to be misinterpreted as the
high bits in syscalls that take timespecs as input.
since the beginning of the port, we've dealt with this situation with
hacks in syscall_arch.h, and injected between __syscall_cp_c and
__syscall_cp_asm, to special-case the syscall numbers that involve
timespecs as inputs and copy them to a form suitable to pass to the
kernel.
commit 40aa18d55a set the stage for
removal of these hacks by letting us treat the "normal" x32 syscalls
dealing with timespec as if they're x32's "time64" syscalls,
effectively making x32 ax "time64-only 32-bit arch" like riscv32 will
be when it's added. since then, all users of syscalls that x32's
syscall_arch.h had hacks for have been updated to use time64 syscalls,
so the hacks can be removed.
there are still at least a few other timespec-related syscalls broken
on x32, which were overlooked when the x32 hacks were done or added
later. these include at least recvmmsg, adjtimex/clock_adjtime, and
timerfd_settime, and they will be fixed independently later on.
x32 is odd in that it's the only ILP32 arch/ABI we have where time_t
is 64-bit rather than (32-bit) long, and this has always been
problematic in that it results in struct timespec having unused
padding space, since tv_nsec has type long, which the kernel insists
be zero- or sign-extended (due to negative tv_nsec being invalid, it
doesn't matter which) to match the x86_64 type.
up til now, we've had really ugly hacks in x32/syscall_arch.h to patch
up the timespecs passed to the kernel. but the same requirement to
zero- or sign-extend tv_nsec also applies to all the new time64
syscalls on true 32-bit archs. so let's take advantage of this to
clean things up.
this patch defines all of the time64 syscalls for x32 as aliases for
the existing syscalls by the same name. this establishes the following
invariants:
- if the time64 form is defined, it takes time arguments as 64-bit
objects, and tv_nsec inputs must be zero-/sign-extended to 64-bit.
- if the time64 form is not defined, or if the time64 form is defined
and is not equal to the "plain" form, the plain form takes time
arguments as longs.
this will avoid the need for protocols for archs to define appropriate
types for each family of syscalls, and for the reader of the code to
have to be aware of such type definitions.
in some sense it might be simpler if the plain syscall form were
undefined for x32, so that it would always take longs if defined.
however, a number of these syscalls are used in contexts with a null
time argument, or (e.g. futex) for commands that don't involve time at
all, and having to introduce time64-specific logic to all those call
points does not make sense. thus, while the "plain" forms are kept now
just because they're needed until the affected code is converted over,
they'll also almost surely be kept in the future as well.
kernel support for x32 was added long after the utimensat syscall was
already available, so having a fallback is just wasted code size.
also, for changes related to time64 support on 32-bit archs, I want to
be able to assume the old futimesat syscall always works with longs,
which is true except for x32. by ensuring that it's not used on x32,
the needed invariant is established.
presently, all archs/ABIs have struct stat matching the kernel
stat[64] type, except mips/mipsn32/mips64 which do conversion hacks in
syscall_arch.h to work around bugs in the kernel type. this patch
completely decouples them and adds a translation step to the success
path of fstatat. at present, this is just a gratuitous copying, but it
opens up multiple possibilities for future support for 64-bit time_t
on 32-bit archs and for cleaned-up/unified ABIs.
for clarity, the mips hacks are not yet removed in this commit, so the
mips kstat structs still correspond to the output of the hacks in
their syscall_arch.h files, not the raw kernel type. a subsequent
commit will fix this.
syscall numbers are now synced up across targets (starting from 403 the
numbers are the same on all targets other than an arch specific offset)
IPC syscalls sem*, shm*, msg* got added where they were missing (except
for semop: only semtimedop got added), the new semctl, shmctl, msgctl
imply IPC_64, see
linux commit 0d6040d4681735dfc47565de288525de405a5c99
arch: add split IPC system calls where needed
new 64bit time_t syscall variants got added on 32bit targets, see
linux commit 48166e6ea47d23984f0b481ca199250e1ce0730a
y2038: add 64-bit time_t syscalls to all 32-bit architectures
new async io syscalls got added, see
linux commit 2b188cc1bb857a9d4701ae59aa7768b5124e262e
Add io_uring IO interface
linux commit edafccee56ff31678a091ddb7219aba9b28bc3cb
io_uring: add support for pre-mapped user IO buffers
a new syscall got added that uses the fd of /proc/<pid> as a stable
handle for processes: allows sending signals without pid reuse issues,
intended to eventually replace rt_sigqueueinfo, kill, tgkill and
rt_tgsigqueueinfo, see
linux commit 3eb39f47934f9d5a3027fe00d906a45fe3a15fad
signal: add pidfd_send_signal() syscall
on some targets (arm, m68k, s390x, sh) some previously missing syscall
numbers got added as well.
this will allow the compiler to cache and reuse the result, meaning we
no longer have to take care not to load it more than once for the sake
of archs where the load may be expensive.
depends on commit 1c84c99913 for
correctness, since otherwise the compiler could hoist loads during
stage 3 of dynamic linking before the initial thread-pointer setup.
these were overlooked in the declarations overhaul work because they
are not properly declared, and the current framework even allows their
declared types to vary by arch. at some point this should be cleaned
up, but I'm not sure what the right way would be.
sys/ptrace.h is target specific, use bits/ptrace.h to add target
specific macro definitions.
these macros are kept in the generic sys/ptrace.h even though some
targets don't support them:
PTRACE_GETREGS
PTRACE_SETREGS
PTRACE_GETFPREGS
PTRACE_SETFPREGS
PTRACE_GETFPXREGS
PTRACE_SETFPXREGS
so no macro definition got removed in this patch on any target. only
s390x has a numerically conflicting macro definition (PTRACE_SINGLEBLOCK).
the PT_ aliases follow glibc headers, otherwise the definitions come
from linux uapi headers except ones that are skipped in glibc and
there is no real kernel support (s390x PTRACE_*_AREA) or need special
type definitions (mips PTRACE_*_WATCH_*) or only relevant for linux
2.4 compatibility (PTRACE_OLDSETOPTIONS).
Update atomic.h to provide a_ctz_l in all cases (atomic_arch.h should
now only provide a_ctz_32 and/or a_ctz_64).
The generic version of a_ctz_32 now takes advantage of a_clz_32 if
available and the generic a_ctz_64 now makes use of a_ctz_32.
PAGESIZE is actually the version defined in POSIX base, with PAGE_SIZE
being in the XSI option. use PAGESIZE as the underlying definition to
facilitate making exposure of PAGE_SIZE conditional.
counts leading zero bits of a 64bit int, undefined on zero input.
(has nothing to do with atomics, added to atomic.h so target specific
helper functions are together.)
there is a logarithmic generic implementation and another in terms of
a 32bit a_clz_32 on targets where that's available.
when _GNU_SOURCE is defined, which is always the case when compiling
c++ with gcc, these macros for the the indices in gregset_t are
exposed and likely to clash with applications. by using enum constants
rather than macros defined with integer literals, we can make the
clash slightly less likely to break software. the macros are still
defined in case anything checks for them with #ifdef, but they're
defined to expand to themselves so that non-file-scope (e.g.
namespaced) identifiers by the same names still work.
for the sake of avoiding mistakes, the changes were generated with sed
via the command:
sed -i -e 's/#define *\(REG_[A-Z_0-9]\{1,\}\) *\([0-9]\{1,\}\)'\
'/enum { \1 = \2 };\n#define \1 \1/' \
arch/i386/bits/signal.h arch/x86_64/bits/signal.h arch/x32/bits/signal.h
gdb can only backtrace/unwind across signal handlers if it recognizes
the sa_restorer trampoline. for x86_64, gdb first attempts to
determine the symbol name for the function in which the program
counter resides and match it against "__restore_rt". if no name can be
found (e.g. in the case of a stripped binary), the exact instruction
sequence is matched instead.
when matching the function name, however, gdb's unwind code wrongly
considers the interval [sym,sym+size] rather than [sym,sym+size).
thus, if __restore_rt begins immediately after another function, gdb
wrongly identifies pc as lying within the previous adjacent function.
this patch adds a nop before __restore_rt to preclude that
possibility. it also removes the symbol name __restore and replaces it
with a macro since the stability of whether gdb identifies the
function as __restore_rt or __restore is not clear.
for the no-symbols case, the instruction sequence is changed to use
%rax rather than %eax to match what gdb expects.
based on patch by Szabolcs Nagy, with extended description and
corresponding x32 changes added.
placing the opening brace on the same line as the struct keyword/tag
is the style I prefer and seems to be the prevailing practice in more
recent additions.
these changes were generated by the command:
find include/ arch/*/bits -name '*.h' \
-exec sed -i '/^struct [^;{]*$/{N;s/\n/ /;}' {} +
and subsequently checked by hand to ensure that the regex did not pick
up any false positives.
the syscalls take an additional flag argument, they were added in commit
f17d8b35452cab31a70d224964cd583fb2845449 and a RWF_HIPRI priority hint
flag was added to linux/fs.h in 97be7ebe53915af504fb491fb99f064c7cf3cb09.
the syscall is not allocated for microblaze and sh yet.
commits e24984efd5 and
16b55298dc inadvertently disabled the
a_spin implementations for i386, x86_64, and x32 by defining a macro
named a_pause instead of a_spin. this should not have caused any
functional regression, but it inhibited cpu relaxation while spinning
for locks.
bug reported by George Kulakowski.
it was introduced for offloading copying between regular files
in linux commit 29732938a6289a15e907da234d6692a2ead71855
(microblaze and sh does not yet have the syscall number.)
currently five targets use the same mman.h constants and the rest
share most constants too, so move them to sys/mman.h before the
bits/mman.h include where the differences can be corrected by
redefinition of the macros.
this fixes two minor bugs: POSIX_MADV_DONTNEED was wrong on most
targets (it should be the same as MADV_DONTNEED), and sh defined
the x86-only MAP_32BIT mmap flag.
all bits headers that were identical for a number of 'clean' archs are
moved to the new arch/generic tree. in addition, a few headers that
differed only cosmetically from the new generic version are removed.
additional deduplication may be possible in mman.h and in several
headers (limits.h, posix.h, stdint.h) that mostly depend on whether
the arch is 32- or 64-bit, but they are left alone for now because
greater gains are likely possible with more invasive changes to header
logic, which is beyond the scope of this commit.
they lock faulted pages into memory (useful when a small part of a
large mapped file needs efficient access), new in linux v4.4, commit
b0f205c2a3082dd9081f9a94e50658c5fa906ff1
MLOCK_* is not in the POSIX reserved namespace for sys/mman.h
this is mlock with a flags argument, new in linux commit
a8ca5d0ecbdde5cc3d7accacbd69968b0c98764e
as usual microblaze and sh don't have allocated syscall number yet.
new in linux v4.3 added for aarch64, arm, i386, mips, or1k, powerpc,
x32 and x86_64.
membarrier is a system wide memory barrier, moves most of the
synchronization cost to one side, new in kernel commit
5b25b13ab08f616efd566347d809b4ece54570d1
userfaultfd is useful for qemu and is new in kernel commit
8d2afd96c20316d112e04d935d9e09150e988397
switch_endian is powerpc only for switching endianness, new in commit
529d235a0e190ded1d21ccc80a73e625ebcad09b
this commit mostly makes consistent things like spacing, function
ordering in atomic_arch.h, argument names, use of volatile, etc.
a_ctz_l was also removed from x86_64 since atomic.h provides it
automatically using a_ctz_64.
rather than having each arch provide its own atomic.h, there is a new
shared atomic.h in src/internal which pulls arch-specific definitions
from arc/$(ARCH)/atomic_arch.h. the latter can be extremely minimal,
defining only a_cas or new ll/sc type primitives which the shared
atomic.h will use to construct everything else.
this commit avoids making heavy changes to the individual archs'
atomic implementations. definitions which are identical or
near-identical to what the new shared atomic.h would produce have been
removed, but otherwise the changes made are just hooking up the
arch-specific files to the new infrastructure. major changes to take
advantage of the new system will come in subsequent commits.
commit 8a8fdf6398 was intended to remove
all such usage, but these arch-specific files were overlooked, leading
to inconsistent declarations and definitions.
using the actual mcontext_t definition rather than an overlaid pointer
array both improves correctness/readability and eliminates some ugly
hacks for archs with 64-bit registers bit 32-bit program counter.
also fix UB due to comparison of pointers not in a common array
object.
commit 3c43c0761e fixed missing
synchronization in the atomic store operation for i386 and x86_64, but
opted to use mfence for the barrier on x86_64 where it's always
available. however, in practice mfence is significantly slower than
the barrier approach used on i386 (a nop-like lock orl operation).
this commit changes x86_64 (and x32) to use the faster barrier.
despite being strongly ordered, the x86 memory model does not preclude
reordering of loads across earlier stores. while a plain store
suffices as a release barrier, we actually need a full barrier, since
users of a_store subsequently load a waiter count to determine whether
to issue a futex wait, and using a stale count will result in soft
(fail-to-wake) deadlocks. these deadlocks were observed in malloc and
possible with stdio locks and other libc-internal locking.
on i386, an atomic operation on the caller's stack is used as the
barrier rather than performing the store itself using xchg; this
avoids the need to read the cache line on which the store is being
performed. mfence is used on x86_64 where it's always available, and
could be used on i386 with the appropriate cpu model checks if it's
shown to perform better.
conceptually, and on other archs, these functions take a pointer to
int, but in the i386, x86_64, and x32 versions of atomic.h, they took
a pointer to void instead.
i386, x86_64, x32, and powerpc all use TLS for stack protector canary
values in the default stack protector ABI, but the location only
matched the ABI on i386 and x86_64. on x32, the expected location for
the canary contained the tid, thus producing spurious mismatches
(resulting in process termination) upon fork. on powerpc, the expected
location contained the stdio_locks list head, so returning from a
function after calling flockfile produced spurious mismatches. in both
cases, the random canary was not present, and a predictable value was
used instead, making the stack protector hardening much less effective
than it should be.
in the current fix, the thread structure has been expanded to have
canary fields at all three possible locations, and archs that use a
non-default location must define a macro in pthread_arch.h to choose
which location is used. for most archs (which lack TLS canary ABI) the
choice does not matter.
the lifetime of compound literals is the block in which they appear.
the temporary struct __timespec_kernel objects created as compound
literals no longer existed at the time their addresses were passed to
the kernel.
the jmp instruction requires a 64-bit register, so cast the desired PC
address up to uint64_t, going through uintptr_t to ensure that it's
zero-extended rather than possibly sign-extended.
in a few places, non-hidden symbols were referenced from asm in ways
that assumed ld-time binding. while these is no semantic reason these
symbols need to be hidden, fixing the references without making them
hidden was going to be ugly, and hidden reduces some bloat anyway.
in the asm files, .global/.hidden directives have been moved to the
top to unclutter the actual code.
this overhaul further reduces the amount of arch-specific code needed
by the dynamic linker and removes a number of assumptions, including:
- that symbolic function references inside libc are bound at link time
via the linker option -Bsymbolic-functions.
- that libc functions used by the dynamic linker do not require
access to data symbols.
- that static/internal function calls and data accesses can be made
without performing any relocations, or that arch-specific startup
code handled any such relocations needed.
removing these assumptions paves the way for allowing libc.so itself
to be built with stack protector (among other things), and is achieved
by a three-stage bootstrap process:
1. relative relocations are processed with a flat function.
2. symbolic relocations are processed with no external calls/data.
3. main program and dependency libs are processed with a
fully-functional libc/ldso.
reduction in arch-specific code is achived through the following:
- crt_arch.h, used for generating crt1.o, now provides the entry point
for the dynamic linker too.
- asm is no longer responsible for skipping the beginning of argv[]
when ldso is invoked as a command.
- the functionality previously provided by __reloc_self for heavily
GOT-dependent RISC archs is now the arch-agnostic stage-1.
- arch-specific relocation type codes are mapped directly as macros
rather than via an inline translation function/switch statement.
while it's the same for all presently supported archs, it differs at
least on sparc, and conceptually it's no less arch-specific than the
other O_* macros. O_SEARCH and O_EXEC are still defined in terms of
O_PATH in the main fcntl.h.