Commit Graph

45 Commits

Author SHA1 Message Date
Rich Felker 3c5c5e6f92 optimize posix_spawn to avoid spurious sigaction syscalls
the trick here is that sigaction can track for us which signals have
ever had a signal handler set for them, and only those signals need to
be considered for reset. this tracking mask may have false positives,
since it is impossible to remove bits from it without race conditions.
false negatives are not possible since the mask is updated with atomic
operations prior to making the sigaction syscall.

implementation-internal signals are set to SIG_IGN rather than SIG_DFL
so that a signal raised in the parent (e.g. calling pthread_cancel on
the thread executing pthread_spawn) does not have any chance make it
to the child, where it would cause spurious termination by signal.

this change reduces the minimum/typical number of syscalls in the
child from around 70 to 4 (including execve). this should greatly
improve the performance of posix_spawn and other interfaces which use
it (popen and system).

to facilitate these changes, sigismember is also changed to return 0
rather than -1 for invalid signals, and to return the actual status of
implementation-internal signals. POSIX allows but does not require an
error on invalid signal numbers, and in fact returning an error tends
to confuse applications which wrongly assume the return value of
sigismember is boolean.
2013-08-09 21:03:47 -04:00
Rich Felker 65d7aa4dfd fix missing errno from exec failure in posix_spawn
failures prior to the exec attempt were reported correctly, but on
exec failure, the return value contained junk.
2013-08-09 20:04:05 -04:00
Rich Felker d4d6d6f322 block signals during fork
there are several reasons for this. some of them are related to race
conditions that arise since fork is required to be async-signal-safe:
if fork or pthread_create is called from a signal handler after the
fork syscall has returned but before the subsequent userspace code has
finished, inconsistent state could result. also, there seem to be
kernel and/or strace bugs related to arrival of signals during fork,
at least on some versions, and simply blocking signals eliminates the
possibility of such bugs.
2013-08-08 23:17:05 -04:00
Rich Felker c8c0844f7f debloat code that depends on /proc/self/fd/%d with shared function
I intend to add more Linux workarounds that depend on using these
pathnames, and some of them will be in "syscall" functions that, from
an anti-bloat standpoint, should not depend on the whole snprintf
framework.
2013-08-02 12:59:45 -04:00
Rich Felker b06dc66639 make posix_spawn (and functions that use it) use CLONE_VFORK flag
this is both a minor scheduling optimization and a workaround for a
difficult-to-fix bug in qemu app-level emulation.

from the scheduling standpoint, it makes no sense to schedule the
parent thread again until the child has exec'd or exited, since the
parent will immediately block again waiting for it.

on the qemu side, as regular application code running on an underlying
libc, qemu cannot make arbitrary clone syscalls itself without
confusing the underlying implementation. instead, it breaks them down
into either fork-like or pthread_create-like cases. it was treating
the code in posix_spawn as pthread_create-like, due to CLONE_VM, which
caused horribly wrong behavior: CLONE_FILES broke the synchronization
mechanism, CLONE_SIGHAND broke the parent's signals, and CLONE_THREAD
caused the child's exec to end the parent -- if it hadn't already
crashed. however, qemu special-cases CLONE_VFORK and emulates that
with fork, even when CLONE_VM is also specified. this also gives
incorrect semantics for code that really needs the memory sharing, but
posix_spawn does not make use of the vm sharing except to avoid
momentary double commit charge.

programs using posix_spawn (including via popen) should now work
correctly under qemu app-level emulation.
2013-07-17 13:54:41 -04:00
Rich Felker a0473a0c82 remove explicit locking to prevent __synccall setuid during posix_spawn
for the duration of the vm-sharing clone used by posix_spawn, all
signals are blocked in the parent process, including
implementation-internal signals. since __synccall cannot do anything
until successfully signaling all threads, the fact that signals are
blocked automatically yields the necessary safety.

aside from debloating and general simplification, part of the
motivation for removing the explicit lock is to simplify the
synchronization logic of __synccall in hopes that it can be made
async-signal-safe, which is needed to make setuid and setgid, which
depend on __synccall, conform to the standard. whether this will be
possible remains to be seen.
2013-04-26 15:09:49 -04:00
Rich Felker 7914ce9204 remove cruft from pre-posix_spawn version of the system function 2013-03-24 22:40:54 -04:00
Rich Felker 23ccb80fcb consistently use the internal name __environ for environ
patch by Jens Gustedt.
previously, the intended policy was to use __environ in code that must
conform to the ISO C namespace requirements, and environ elsewhere.
this policy was not followed in practice anyway, making things
confusing. on top of that, Jens reported that certain combinations of
link-time optimization options were breaking with the inconsistent
references; this seems to be a compiler or linker bug, but having it
go away is a nice side effect of the changes made here.
2013-02-17 14:24:39 -05:00
Rich Felker a8799356d5 base system() on posix_spawn
this avoids duplicating the fragile logic for executing an external
program without fork.
2013-02-03 18:21:04 -05:00
Rich Felker 4862864fc1 fix unsigned comparison bug in posix_spawn
read should never return anything but 0 or sizeof ec here, but if it
does, we want to treat any other return as "success". then the caller
will get back the pid and is responsible for waiting on it when it
immediately exits.
2013-02-03 17:09:47 -05:00
Rich Felker fb6b159d9e overhaul posix_spawn to use CLONE_VM instead of vfork
the proposed change was described in detail in detail previously on
the mailing list. in short, vfork is unsafe because:

1. the compiler could make optimizations that cause the child to
clobber the parent's local vars.

2. strace is buggy and allows the vforking parent to run before the
child execs when run under strace.

the new design uses a close-on-exec pipe instead of vfork semantics to
synchronize the parent and child so that the parent does not return
before the child has finished using its arguments (and now, also its
stack). this also allows reporting exec failures to the caller instead
of giving the caller a child that mysteriously exits with status 127
on exec error.

basic testing has been performed on both the success and failure code
paths. further testing should be done.
2013-02-03 16:42:40 -05:00
Rich Felker 96fbcf7d80 fix up minor misplacement of restrict keyword in spawnattr sched stubs 2013-02-01 15:57:28 -05:00
Rich Felker 1e21e78bf7 add support for thread scheduling (POSIX TPS option)
linux's sched_* syscalls actually implement the TPS (thread
scheduling) functionality, not the PS (process scheduling)
functionality which the sched_* functions are supposed to have.
omitting support for the PS option (and having the sched_* interfaces
fail with ENOSYS rather than omitting them, since some broken software
assumes they exist) seems to be the only conforming way to do this on
linux.
2012-11-11 15:38:04 -05:00
Rich Felker efd4d87aa4 clean up sloppy nested inclusion from pthread_impl.h
this mirrors the stdio_impl.h cleanup. one header which is not
strictly needed, errno.h, is left in pthread_impl.h, because since
pthread functions return their error codes rather than using errno,
nearly every single pthread function needs the errno constants.

in a few places, rather than bringing in string.h to use memset, the
memset was replaced by direct assignment. this seems to generate much
better code anyway, and makes many functions which were previously
non-leaf functions into leaf functions (possibly eliminating a great
deal of bloat on some platforms where non-leaf functions require ugly
prologue and/or epilogue).
2012-11-08 17:04:20 -05:00
Rich Felker 76f28cfce5 system is a cancellation point
ideally, system would also be cancellable while running the external
command, but I cannot find any way to make that work without either
leaking zombie processes or introducing behavior that is far outside
what the standard specifies. glibc handles cancellation by killing the
child process with SIGKILL, but this could be unsafe in that it could
leave the data being manipulated by the command in an inconsistent
state.
2012-10-28 21:17:45 -04:00
Rich Felker 599f973603 fix usage of locks with vfork
__release_ptc() is only valid in the parent; if it's performed in the
child, the lock will be unlocked early then double-unlocked later,
corrupting the lock state.
2012-10-19 15:02:37 -04:00
Rich Felker 97c8bdd88a fix parent-memory-clobber in posix_spawn (environ) 2012-10-18 16:41:27 -04:00
Rich Felker 44eb4d8b9b overhaul system() and popen() to use vfork; fix various related bugs
since we target systems without overcommit, special care should be
taken that system() and popen(), like posix_spawn(), do not fail in
processes whose commit charges are too high to allow ordinary forking.

this in turn requires special precautions to ensure that the parent
process's signal handlers do not end up running in the shared-memory
child, where they could corrupt the state of the parent process.

popen has also been updated to use pipe2, so it does not have a
fd-leak race in multi-threaded programs. since pipe2 is missing on
older kernels, (non-atomic) emulation has been added.

some silly bugs in the old code should be gone too.
2012-10-18 15:58:23 -04:00
Rich Felker d5304147b9 block uid/gid changes during posix_spawn
usage of vfork creates a situation where a process of lower privilege
may momentarily have write access to the memory of a process of higher
privilege.

consider the case of a multi-threaded suid program which is calling
posix_spawn in one thread while another thread drops the elevated
privileges then runs untrusted (relative to the elevated privilege)
code as the original invoking user. this untrusted code can then
potentially modify the data the child process will use before calling
exec, for example changing the pathname or arguments that will be
passed to exec.

note that if vfork is implemented as fork, the lock will not be held
until the child execs, but since memory is not shared it does not
matter.
2012-10-15 11:42:46 -04:00
Rich Felker d62f4e9888 use vfork if possible in posix_spawn
vfork is implemented as the fork syscall (with no atfork handlers run)
on archs where it is not available, so this change does not introduce
any change in behavior or regression for such archs.
2012-09-14 15:32:51 -04:00
Rich Felker 400c5e5c83 use restrict everywhere it's required by c99 and/or posix 2008
to deal with the fact that the public headers may be used with pre-c99
compilers, __restrict is used in place of restrict, and defined
appropriately for any supported compiler. we also avoid the form
[restrict] since older versions of gcc rejected it due to a bug in the
original c99 standard, and instead use the form *restrict.
2012-09-06 22:44:55 -04:00
Rich Felker 4cf667c9c9 x86_64 vfork implementation
untested; should work.
2012-02-06 18:23:11 -05:00
Rich Felker 2bb75db611 support vfork on i386 2011-10-14 23:56:31 -04:00
Rich Felker 768f39a535 make available a namespace-safe vfork, if supported
this may be useful to posix_spawn..?
2011-10-14 23:34:12 -04:00
Rich Felker 4b87736998 fix various bugs in path and error handling in execvp/fexecve 2011-09-29 00:48:04 -04:00
Rich Felker 13cd969552 fix various errors in function signatures/prototypes found by nsz 2011-09-13 21:09:35 -04:00
Rich Felker 0f1ef81462 add missing posix_spawnattr_init/destroy functions 2011-09-13 14:45:59 -04:00
Rich Felker 98acf04fc0 use weak aliases rather than function pointers to simplify some code 2011-08-06 20:09:51 -04:00
Rich Felker 4ec07e1f60 ensure in fork that child gets its own new robust mutex list 2011-07-16 23:17:17 -04:00
Rich Felker f48832ee15 fix backwards posix_spawn file action order 2011-05-29 12:58:02 -04:00
Rich Felker dd45edb5ff add accidentally-omitted file needed for posix_spawn file actions 2011-05-28 23:31:11 -04:00
Rich Felker a0ae0b0936 add file actions support to posix_spawn 2011-05-28 23:30:47 -04:00
Rich Felker d6c0c97846 posix_spawn: honor POSIX_SPAWN_SETSIGDEF flag 2011-05-28 18:39:43 -04:00
Rich Felker c97f0d998c initial implementation of posix_spawn
file actions are not yet implemented, but everything else should be
mostly complete and roughly correct.
2011-05-28 18:36:30 -04:00
Rich Felker e6bac87d0e correct variadic prototypes for execl* family
the old versions worked, but conflicted with programs which declared
their own prototypes and generated warnings with some versions of gcc.
2011-04-27 16:06:33 -04:00
Rich Felker 870cc67977 fix minor bugs due to incorrect threaded-predicate semantics
some functions that should have been testing whether pthread_self()
had been called and initialized the thread pointer were instead
testing whether pthread_create() had been called and actually made the
program "threaded". while it's unlikely any mismatch would occur in
real-world problems, this could have introduced subtle bugs. now, we
store the address of the main thread's thread descriptor in the libc
structure and use its presence as a flag that the thread register is
initialized. note that after fork, the calling thread (not necessarily
the original main thread) is the new main thread.
2011-04-20 21:41:45 -04:00
Rich Felker 9080cc153c clean up handling of thread/nothread mode, locking 2011-04-17 16:53:54 -04:00
Rich Felker feee98903c overhaul pthread cancellation
this patch improves the correctness, simplicity, and size of
cancellation-related code. modulo any small errors, it should now be
completely conformant, safe, and resource-leak free.

the notion of entering and exiting cancellation-point context has been
completely eliminated and replaced with alternative syscall assembly
code for cancellable syscalls. the assembly is responsible for setting
up execution context information (stack pointer and address of the
syscall instruction) which the cancellation signal handler can use to
determine whether the interrupted code was in a cancellable state.

these changes eliminate race conditions in the previous generation of
cancellation handling code (whereby a cancellation request received
just prior to the syscall would not be processed, leaving the syscall
to block, potentially indefinitely), and remedy an issue where
non-cancellable syscalls made from signal handlers became cancellable
if the signal handler interrupted a cancellation point.

x86_64 asm is untested and may need a second try to get it right.
2011-04-17 11:43:03 -04:00
Rich Felker e2915eeeea speed up threaded fork
after fork, we have a new process and the pid is equal to the tid of
the new main thread. there is no need to make two separate syscalls to
obtain the same number.
2011-04-12 17:52:14 -04:00
Rich Felker b470030f83 overhaul cancellation to fix resource leaks and dangerous behavior with signals
this commit addresses two issues:

1. a race condition, whereby a cancellation request occurring after a
syscall returned from kernelspace but before the subsequent
CANCELPT_END would cause cancellable resource-allocating syscalls
(like open) to leak resources.

2. signal handlers invoked while the thread was blocked at a
cancellation point behaved as if asynchronous cancellation mode wer in
effect, resulting in potentially dangerous state corruption if a
cancellation request occurs.

the glibc/nptl implementation of threads shares both of these issues.

with this commit, both are fixed. however, cancellation points
encountered in a signal handler will not be acted upon if the signal
was received while the thread was already at a cancellation point.
they will of course be acted upon after the signal handler returns, so
in real-world usage where signal handlers quickly return, it should
not be a problem. it's possible to solve this problem too by having
sigaction() wrap all signal handlers with a function that uses a
pthread_cleanup handler to catch cancellation, patch up the saved
context, and return into the cancellable function that will catch and
act upon the cancellation. however that would be a lot of complexity
for minimal if any benefit...
2011-03-24 14:18:00 -04:00
Rich Felker aa398f56fa global cleanup to use the new syscall interface 2011-03-20 00:16:43 -04:00
Rich Felker 3f5420bcda make fork properly initialize the main thread in the child process 2011-03-09 20:23:44 -05:00
Rich Felker f2374ed852 implement fexecve 2011-02-27 02:59:23 -05:00
Rich Felker e9417fffb3 add pthread_atfork interface
note that this presently does not handle consistency of the libc's own
global state during forking. as per POSIX 2008, if the parent process
was threaded, the child process may only call async-signal-safe
functions until one of the exec-family functions is called, so the
current behavior is believed to be conformant even if non-ideal. it
may be improved at some later time.
2011-02-18 19:52:42 -05:00
Rich Felker 0b44a0315b initial check-in, version 0.5.0 2011-02-12 00:22:29 -05:00