multiple opens of the same named semaphore must return the same
pointer, and only the last close can unmap it. thus the ugly global
state keeping track of mappings. the maximum number of distinct named
semaphores that can be opened is limited sufficiently small that the
linear searches take trivial time, especially compared to the syscall
overhead of these functions.
we can avoid blocking signals by simply using a flag to mark that the
thread has exited and prevent it from getting counted in the rsyscall
signal-pingpong. this restores the original pthread create/join
throughput from before the sigprocmask call was added.
1. any padding in the siginfo struct was not necessarily zero-filled,
so it might have contained private data off the caller's stack.
2. the uid and pid must be filled in from userspace. the previous
rsyscall fix broke rsyscalls because the values were always incorrect.
these functions are specified inconsistent in whether they're
specified to return an error value, or return -1 and set errno.
hopefully now they all match what POSIX requires.
the set_tid_address returns the tid (which is also the pid when called
from the initial thread) so there is no need to make a separate
syscall to get pid/tid.
a signal handler could fork after the pid/tid were read, causing the
wrong process to be signalled. i'm not sure if this is supposed to
have UB or not, but raise is async-signal-safe, so it probably is
allowed. the current solution is slightly expensive so this
implementation is likely to be changed in the future.
problem 1: mutex type from the attribute was being ignored by
pthread_mutex_init, so recursive/errorchecking mutexes were never
being used at all.
problem 2: ownership of recursive mutexes was not being enforced at
unlock time.
this should avoid warnings about unused libs when not linking, and
might fix some other obscure issues too. i might replace this approach
with a completely different one soon though.
this code was written independently of musl, with support for a the
backwards, nonstandard "31-bit unicode" some libraries/apps might
want. unfortunately the extra code (inside #ifdef) makes the source
harder to read and makes code that should be simple look complex, so
i'm removing it. anyone who wants to use the old code can find it in
the history or from elsewhere.
also, change the visibility of the __fsmu8 state machine table to
hidden, if supported. this should improve performance slightly in
shared-library builds.
prefer using visibility=hidden for __libc internal data, rather than
an accessor function, if the compiler has visibility.
optimize with -O3 for PIC targets (shared library). without heavy
inlining, reloading the GOT register in small functions kills
performance. 20-30% size increase for a single libc.so is not a big
deal, compared to comparaible size increase in every static binaries.
use -Bsymbolic-functions, not -Bsymbolic. global variables are subject
to COPY relocations, and thus binding their addresses in the library
at link time will cause library functions to read the wrong (original)
copies instead of the copies made in the main program's bss section.
add entry point, _start, for dynamic linker.