despite documentation that makes it sound a lot different, the only
ABI-constraint difference between TLS variants II and I seems to be
that variant II stores the initial TLS segment immediately below the
thread pointer (i.e. the thread pointer points to the end of it) and
variant I stores the initial TLS segment above the thread pointer,
requiring the thread descriptor to be stored below. the actual value
stored in the thread pointer register also tends to have per-arch
random offsets applied to it for silly micro-optimization purposes.
with these changes applied, TLS should be basically working on all
supported archs except microblaze. I'm still working on getting the
necessary information and a working toolchain that can build TLS
binaries for microblaze, but in theory, static-linked programs with
TLS and dynamic-linked programs where only the main executable uses
TLS should already work on microblaze.
alignment constraints have not yet been heavily tested, so it's
possible that this code does not always align TLS segments correctly
on archs that need TLS variant I.
currently, only i386 is tested. x86_64 and arm should probably work.
the necessary relocation types for mips and microblaze have not been
added because I don't understand how they're supposed to work, and I'm
not even sure if it's defined yet on microblaze. I may be able to
reverse engineer the requirements out of gcc/binutils output.
if same register is used for input/output, the compiler must be told.
otherwise is generates random junk code that clobbers the result. in
pure syscall-wrapper functions, nothing went wrong, but in more
complex functions where register allocation is non-trivial, things
broke badly.
I'm not 100% sure that Linux's O_PATH meets the POSIX requirements for
O_SEARCH, but it seems very close if not perfect. and old kernels
ignore it, so O_SEARCH will still work as desired as long as the
caller has read permissions to the directory.
by using the "ir" constraint (immediate or register) and the carefully
constructed instruction addu $2,$0,%2 which can take either an
immediate or a register for %2, the new inline asm admits maximal
optimization with no register spillage to the stack when the compiler
successfully performs constant propagration, but still works by
allocating a register when the syscall number cannot be recognized as
a constant. in the case of syscalls with 0-3 arguments it barely
matters, but for 4-argument syscalls, using an immediate for the
syscall number avoids creating a stack frame for the syscall wrapper
function.
all past and current kernel versions have done so, but there seems to
be no reason it's necessary and the sentiment from everyone I've asked
has been that we should not rely on it. instead, use r7 (an argument
register) which will necessarily be preserved upon syscall restart.
however this only works for 0-3 argument syscalls, and we have to
resort to the function call for 4-argument syscalls.
this drastically reduces the size of some functions which are purely
syscall wrappers.
disabled for clang due to known bugs satisfying register constraints.
now public syscall.h only exposes __NR_* and SYS_* constants and the
variadic syscall function. no macros or inline functions, no
__syscall_ret or other internal details, no 16-/32-bit legacy syscall
renaming, etc. this logic has all been moved to src/internal/syscall.h
with the arch-specific parts in arch/$(ARCH)/syscall_arch.h, and the
amount of arch-specific stuff has been reduced to a minimum.
changes still need to be reviewed/double-checked. minimal testing on
i386 and mips has already been performed.
clang does not presently support the "v" constraint we want to use to
get the result from $3, and trying to use register...__asm__("$3") to
do the same invokes serious compiler bugs. so for now, i'm working
around the issue with an extra temp register and putting $3 in the
clobber list instead of using it as output. when the bugs in clang are
fixed, this issue should be revisited to generate smaller/faster code
like what gcc gets.
while musl itself requires a c99 compiler, some applications insist on
being compiled with c89 compilers, and use of "inline" in the headers
was breaking them. much of this had been avoided already by just
skipping the inline keyword in pre-c99 compilers or modes, but this
new unified solution is cleaner and may/should result in better code
generation in the default gcc configuration.
not heavily tested, but the basics are working. the basic concept is
that the dynamic linker entry point code invokes a pure-PIC (no global
accesses) C function in reloc.h to perform the early GOT relocations
needed to make the dynamic linker itself functional, then invokes
__dynlink like on other archs. since mips uses some ugly arch-specific
hacks to optimize relocating the GOT (rather than just using the
normal DT_REL[A] tables like on other archs), the dynamic linker has
been modified slightly to support calling arch-specific relocation
code in reloc.h.
most of the actual mips-specific behavior was developed by reading the
output of readelf on libc.so and simple executable files. i could not
find good reference information on which relocation types need to be
supported or their semantics, so it's possible that some legitimate
usage cases will not work yet.
also fix the alignment of jmp_buf to meet the abi. linux always
emulates fpu on mips if it's not present, so enabling this code
unconditionally is "safe" but may be slow. in the long term it may be
preferable to find a way to disable it on soft float builds.
the fields in the mcontext_t are long long (for no good reason) even
on 32-bit mips, so the offset of the instruction pointer (as a word)
varies depending on endianness.
the kernel wrongly expects the cmsg length field to be size_t instead
of socklen_t. in order to work around the issue, we have to impose a
length limit and copy to a local buffer. the length limit should be
more than sufficient for any real-world use; these headers are only
used for passing file descriptors and permissions between processes
over unix sockets.
basically, this version of the code was obtained by starting with
rdp's work from his ellcc source tree, adapting it to musl's build
system and coding style, auditing the bits headers for discrepencies
with kernel definitions or glibc/LSB ABI or large file issues, fixing
up incompatibility with the old binutils from aboriginal linux, and
adding some new special cases to deal with the oddities of sigaction
and pipe syscall interfaces on mips.
at present, minimal test programs work, but some interfaces are broken
or missing. threaded programs probably will not link.