the bug fixed in commit b82cd6c78d was
mostly masked on arm because __hwcap was zero at the point of the call
from the dynamic linker to __set_thread_area, causing the access to
libc.auxv to be skipped and kuser_helper versions of TLS access and
atomics to be used instead of the armv6 or v7 versions. however, on
kernels with kuser_helper removed for hardening it would crash.
since __set_thread_area potentially uses __hwcap, it must be
initialized before the function is called. move the AT_HWCAP lookup
from stage 3 to stage 2b.
This enables alternative compilers, which may not define __GNUC__,
to implement alloca, which is still fairly widely used.
This is similar to how stdarg.h already works in musl; compilers must
implement __builtin_va_arg, there is no fallback definition.
this change was discussed on the mailing list thread for the linux
uapi v5.3 patches, and submitted as a v2 patch, but overlooked when I
applied the patches much later.
revert commit f291c09ec9 and apply the
v2 as submitted; the net change is just padding.
notes by Szabolcs Nagy follow:
compared to the linux uapi (and glibc) a padding is used instead of
aligned attribute for keeping the layout the same across targets, this
means the alignment of the struct may be different on some targets
(e.g. m68k where uint64_t is 2 byte aligned) but that should not affect
syscalls and this way the abi does not depend on nonstandard extensions.
at least gcc 9 broke execution of DT_INIT/DT_FINI for fdpic archs
(presently only sh) by recognizing that the stores to the
compound-literal function descriptor constructed to call them were
dead stores. there's no way to make a "may_alias function", so instead
launder the descriptor through an asm-statement barrier. in practice
just making the compound literal volatile seemed to have worked too,
but this should be less of a hack and more accurately convey the
semantics of what transformations are not valid.
commit 1c84c99913 moved the call to
__init_tp above the initialization of libc.auxv, inadvertently
breaking archs where __set_thread_area examines auxv for the sake of
determining the TLS/atomic model needed at runtime. this broke armv6
and sh2.
the syscall numbers were reserved in v5.3 but not wired up on mips, see
linux commit 0671c5b84e9e0a6d42d22da9b5d093787ac1c5f3
MIPS: Wire up clone3 syscall
mips application specific isa extensions were previously not exported
in hwcaps so userspace could not apply optimized code at runtime.
linux commit 38dffe1e4dde1d3174fdce09d67370412843ebb5
MIPS: elf_hwcap: Export userspace ASEs
allows waiting on a pidfd, in the future it might allow retrieving the
exit status by a non-parent process, see
linux commit 3695eae5fee0605f316fbaad0b9e3de791d7dfaf
pidfd: add P_PIDFD to waitid()
tcpi_rcv_ooopack for tracking connection quality:
linux commit f9af2dbbfe01def62765a58af7fbc488351893c3
tcp: Add TCP_INFO counter for packets received out-of-order
tcpi_snd_wnd peer window size for diagnosing tcp performance problems:
linux commit 8f7baad7f03543451af27f5380fc816b008aa1f2
tcp: Add snd_wnd to TCP_INFO
per thread prctl commands to relax the syscall abi such that top bits
of user pointers are ignored in the kernel. this allows the use of
those bits by hwasan or by mte to color pointers and memory on aarch64:
linux commit 63f0c60379650d82250f22e4cf4137ef3dc4f43d
arm64: Introduce prctl() options to control the tagged user addresses ABI
These were mainly introduced so android can optimize the memory usage
of unused apps.
MADV_COLD hints that the memory range is currently not needed (unlike
with MADV_FREE the content is not garbage, it needs to be swapped):
linux commit 9c276cc65a58faf98be8e56962745ec99ab87636
mm: introduce MADV_COLD
MADV_PAGEOUT hints that the memory range is not needed for a long time
so it can be reclaimed immediately independently of memory pressure
(unlike with MADV_DONTNEED the content is not garbage):
linux commit 1a4e58cce84ee88129d5d49c064bd2852b481357
mm: introduce MADV_PAGEOUT
the syscall number is reserved on all targets, but it is not wired up
on all targets, see
linux commit 8f6ccf6159aed1f04c6d179f61f6fb2691261e84
Merge tag 'clone3-v5.3' of ... brauner/linux
linux commit 8f3220a806545442f6f26195bc491520f5276e7c
arch: wire-up clone3() syscall
linux commit 7f192e3cd316ba58c88dfa26796cf77789dd9872
fork: add clone3
see
linux commit 7615d9e1780e26e0178c93c55b73309a5dc093d7
arch: wire-up pidfd_open()
linux commit 32fcb426ec001cb6d5a4a195091a8486ea77e2df
pid: add pidfd_open()
ptrace API to get details of the syscall the tracee is blocked in, see
linux commit 201766a20e30f982ccfe36bebfad9602c3ff574a
ptrace: add PTRACE_GET_SYSCALL_INFO request
the align attribute was used to keep the layout the same across targets
e.g. on m68k uint32_t is 2 byte aligned, this helps with compat ptrace.
adding this condition makes the entire convert_ioctl_struct function
and compat_map table statically unreachable, and thereby optimized out
by dead code elimination, on archs where they are not needed.
VIDIOC_OMAP3ISP_STAT_REQ is a device-specific command for the omap3isp
video device. the command number is in a device-private range and
therefore could theoretically be used by other devices too in the
future, but problematic clashes should not be able to arise without
intentional misuse.
This ensures that the musl definition of 'struct iphdr' does not conflict
with the Linux kernel UAPI definition of it.
Some software, i.e. net-tools, will not compile against 5.4 kernel headers
without this patch and the corresponding Linux kernel patch.
since time64 switchover has changed the size and layout of the struct
anyway, take the opportunity to fix it up so that it can be shared
between 32- and 64-bit ABIs on the same system as long as byte order
matches.
the ut_type member is explicitly padded to make up for m68k having
only 2-byte alignment; explicit padding has no effect on other archs.
ut_session is changed from long to int, with endian-matched padding.
this affects 64-bit archs as well, but brings the type into alignment
with glibc's x86_64 struct, so it should not break software, and does
not break on-disk format. the semantic type is int (pid-like) anyway.
the padding produces correct alignment for the ut_tv member on 32-bit
archs that don't naturally align it, so that ABI matches 64-bit.
this type is presently not used anywhere in the ABI between libc and
libc consumers; it's only used between pairs of consumers if a
third-party utmp library using the system utmpx.h is in use.
the elf_prstatus structure is used in core dumps, and the timeval
structures in it are longs matching the elf class, *not* the kernel
"old timeval" for the arch. this means using timeval here for x32 was
always wrong, despite kernel uapi headers and glibc also exposing it
this way, and of course it's wrong for any arch with 64-bit time_t.
rather than just changing the type on affected archs, use a tagless
struct containing long tv_sec and tv_usec members in place of the
timevals. this intentionally breaks use of them as timevals (e.g.
assignment, passing address, etc.) on 64-bit archs as well so that any
usage unsafe for 32-bit archs is caught even in software that only
gets tested on 64-bit archs. from what I could gather, there is not
any software using these members anyway. the only reason they need to
be fixed to begin with is that the only members which are commonly
used, the saved registers, follow the time members and have the wrong
offset if the time members are sized incorrectly.
commit ae388becb5 accidentally
introduced #define SYSCALL_NO_TLS 1 in mmap.c, which was probably a
stale change left around from unrelated syscall timing measurements.
reverse it.
this commit covers all remaining ioctls I'm aware of that use
time_t-derived types in their interfaces. it may still be incomplete,
and has undergone only minimal testing for a few commands used in
audio playback.
the SNDRV_PCM_IOCTL_SYNC_PTR command is special-cased because, rather
than the whole structure expanding, it has two substructures each
padded to 64 bytes that expand within their own 64-byte reserved zone.
as long as it's the only one of its type, it doesn't really make sense
to make a general framework for it, but the existing table framework
is still used for the substructures in the special-case. one of the
substructures, snd_pcm_mmap_status, has a snd_pcm_uframes_t member
which is not a timestamp but is expanded just like one, to match the
64-bit-arch version of the structure. this is handled just like a
timestamp at offset 8, and is the motivation for the conversions table
holding offsets of individual values to be expanded rather than
timespec/timeval type pairs.
for some of the types, the size to which they expand is dependent on
whether the arch's ABI aligns 8-byte types on 8-byte boundaries.
new_req entries in the table need to reflect this size to get the
right ioctl request number that will match what callers pass, but we
don't have access to the actual structure type definitions here and
duplicating them would be cumbersome. instead, the new_misaligned
macro introduced here constructs an artificial object whose size is
the result of expanding a misaligned timespec/timeval to 64-bit and
imposing the arch's alignment on the result, which can be passed to
the _IO{R,W,WR} macros.
record offsets of individual slots that expand from 32- to 64-bit,
rather than timespec/timeval pairs. this flexibility will be needed
for some ioctls. reduce size of types in table. adjust representation
of offsets to include a count rather than needing -1 padding so that
the table is less ugly and doesn't need large diffs if we increase max
number of slots.
with the current set of supported ioctls, this conversion is hardly an
improvement, but it sets the stage for being able to do alsa, v4l2,
ppp, and other ioctls with timespec/timeval-derived types. without
this capability, a lot of functionality users depend on would stop
working with the time64 switchover.
commit b60fdf133c broke the
SIOCGSTAMP[NS] ioctl fallbacks introduced in commit
2e554617e5, as well as use of these
ioctls, by creating a situation where bits/ioctl.h could be included
without __LONG_MAX being visible.
always try the time64 syscall first since we can use its success to
conclude that no conversion is needed (any setsockopt for the
timestamp options would have succeeded without need for fallbacks).
otherwise, we have to remember the original controllen for each
msghdr, requiring O(vlen) space, so vlen must be bounded. linux clamps
it to IOV_MAX for sendmmsg only (not recvmmsg), but doing the same for
recvmmsg is not unreasonable, especially since the limitation will
only apply to old kernels.
we could optimize to avoid trying SYS_recvmmsg_time64 first if all
msghdrs have controllen zero, or support unlimited vlen by looping and
emulating the timeout logic, but I'm not inclined to do complex and
error-prone optimizations on a function that has so many underlying
problems it should really never be used.
the definitions of SO_TIMESTAMP* changed on 32-bit archs in commit
3814333964 to the new versions that
provide 64-bit versions of timeval/timespec structure in control
message payload. socket options, being state attached to the socket
rather than function calls, are not trivial to implement as fallbacks
on ENOSYS, and support for them was initially omitted on the
assumption that the ioctl-based polling alternatives (SIOCGSTAMP*)
could be used instead by applications if setsockopt fails.
unfortunately, it turns out that SO_TIMESTAMP is sufficiently old and
widely supported that a number of applications assume it's available
and treat errors as fatal.
this patch introduces emulation of SO_TIMESTAMP[NS] on pre-time64
kernels by falling back to setting the "_OLD" (time32) versions of the
options if the time64 ones are not recognized, and performing
translation of the SCM_TIMESTAMP[NS] control messages in recvmsg.
since recvmsg does not know whether its caller is legacy time32 code
or time64, it performs translation for any SCM_TIMESTAMP[NS]_OLD
control messages it sees, leaving the original time32 timestamp as-is
(it can't be rewritten in-place anyway, and memmove would be mildly
expensive) and appending the converted time64 control message at the
end of the buffer. legacy time32 callers will see the converted one as
a spurious control message of unknown type; time64 callers running on
pre-time64 kernels will see the original one as a spurious control
message of unknown type. a time64 caller running on a kernel with
native time64 support will only see the time64 version of the control
message.
emulation of SO_TIMESTAMPING is not included at this time since (1)
applications which use it seem to be prepared for the possibility that
it's not present or working, and (2) it can also be used in sendmsg
control messages, in a manner that looks complex to emulate
completely, and costly even when running on a time64-supporting
kernel.
corresponding changes in recvmmsg are not made at this time; they will
be done separately.
linux/input.h and perhaps others use this macro to determine whether
the userspace time_t is 64-bit when potentially defining types in
terms of time_t and derived structures. the name __USE_TIME_BITS64 is
unfortunate; it really should have been in the __UAPI namespace. but
this is what was chosen back in v4.16 when first preparing input.h for
time64 userspace, presumably based on expectations about what the
glibc-internal features.h macro for time64 would be, and changing it
now would just put a new minimum version requirement on kernel
headers.
the __USE_TIME_BITS64 macro is not intended as a public interface. it
is purely an internal contract between libc and Linux uapi headers.
this interface permits a null pointer for where to store the old
itimerval being replaced. an early version of the time32 compat shim
code had corresponding bugs for lots of functions; apparently
setitimer was overlooked when fixing them.
commit 4d3a162d00 overlooked that the
mips64 reloc.h dependent on endian.h not only for setting the ABI ldso
name to match the byte order, but also for use of the byte swapping
macros. they are needed to override R_TYPE, R_SYM, and R_INFO, to
compensate for a mips "quirk" of always using big endian order for
symbol references in relocations.
part of that commit canot be reverted because the original code was
wrong: it's invalid to define _GNU_SOURCE or any feature test macro
in reloc.h, or anywhere except at the top of a source file. however,
thanks to commit 316730cdc7, the feature
test macro is no longer needed to access the endian-swapping macros,
so simply bringing back the #include directive suffices.
commit de90f38e3b omitted $(srcdir) from
the makefile include pathname it added. since the include directive
was prefixed with - to make it optional (for archs that don't use it),
the failure to find arch/$(ARCH)/arch.mak was silent.
in commit 22daaea39f, the
__dlsym_redir_time64 function providing the backend for __dlsym_time64
was defined only in the dynamic linker, and thus was undefined when
static linking a program referencing dlsym. use the same stub_dlsym
definition that provides __dlsym (the non-redirecting backend) for
static linked programs to provide it, conditional on _REDIR_TIME64.
now that all 32-bit archs have 64-bit time_t (and suseconds_t), the
arch-provided _Int64 macro (long or long long, as appropriate) can be
used to define them, and arch-specific definitions are no longer
needed.
now that all 32-bit archs have 64-bit time types, the values for the
time-related ioctls can be shared. the mechanism for this is an
arch/generic version of the bits header. archs which don't use the
generic header still need to duplicate the definitions.
x32, which does not use the new time64 values of the macros, already
has its own overrides, so this commit does not affect it.
now that all 32-bit archs have 64-bit time types, the values for the
time-related socket option macros can be treated as universal for
32-bit archs. the sys/socket.h mechanism for this predates
arch/generic and is instead in the top-level header.
x32, which does not use the new time64 values of the macros, already
has its own overrides, so this commit does not affect it.
this commit preserves ABI fully for existing interface boundaries
between libc and libc consumers (applications or libraries), by
retaining existing symbol names for the legacy 32-bit interfaces and
redirecting sources compiled against the new headers to alternate
symbol names. this does not necessarily, however, preserve the
pairwise ABI of libc consumers with one another; where they use
time_t-derived types in their interfaces with one another, it may be
necessary to synchronize updates with each other.
the intent is that ABI resulting from this commit already be stable
and permanent, but it will not be officially so until a release is
made. changes to some header-defined types that do not play any role
in the ABI between libc and its consumers may still be subject to
change.
mechanically, the changes made by this commit for each 32-bit arch are
as follows:
- _REDIR_TIME64 is defined to activate the symbol redirections in
public headers
- COMPAT_SRC_DIRS is defined in arch.mak to activate build of ABI
compat shims to serve as definitions for the original symbol names
- time_t and suseconds_t definitions are changed to long long (64-bit)
- IPC_STAT definition is changed to add the IPC_TIME64 bit (0x100),
triggering conversion of semid_ds, shmid_ds, and msqid_ds split
low/high time bits into new time_t members
- structs semid_ds, shmid_ds, msqid_ds, and stat are modified to add
new 64-bit time_t/timespec members at the end, maintaining existing
layout of other members.
- socket options (SO_*) and ioctl (sockios) command macros are
redefined to use the kernel's "_NEW" values.
in addition, on archs where vdso clock_gettime is used, the
VDSO_CGT_SYM macro definition in syscall_arch.h is changed to use a
new time64 vdso function if available, and a new VDSO_CGT32_SYM macro
is added for use as fallback on kernels lacking time64.
these definitions are copied from generic bits/ioctl.h, so that x32
keeps the "_OLD" versions (which are already time64 on x32) when
32-bit archs switch to 64-bit time_t.
these definitions are merely copied from the top-level sys/socket.h,
so there is no functional change at this time. however, the top-level
definitions will change to use the time64 "_NEW" versions on 32-bit
archs when time_t is switched over to 64-bit. this commit ensures that
change will be suppressed on x32.