I intend to add more Linux workarounds that depend on using these
pathnames, and some of them will be in "syscall" functions that, from
an anti-bloat standpoint, should not depend on the whole snprintf
framework.
previously, the AT_SYMLINK_NOFOLLOW flag was ignored, giving
dangerously incorrect behavior -- the target of the symlink had its
modes changed to the modes (usually 0777) intended for the symlink).
this issue was amplified by the fact that musl provides lchmod, as a
wrapper for fchmodat, which some archival programs take as a sign that
symlink modes are supported and thus attempt to use.
emulating AT_SYMLINK_NOFOLLOW was a difficult problem, and I
originally believed it could not be solved, at least not without
depending on kernels newer than 3.5.x or so where O_PATH works halfway
well. however, it turns out that accessing O_PATH file descriptors via
their pseudo-symlink entries in /proc/self/fd works much better than
trying to use the fd directly, and works even on older kernels.
moreover, the kernel has permanently pegged these references to the
inode obtained by the O_PATH open, so there should not be race
conditions with the file being moved, deleted, replaced, etc.
this is the modern way, and the only way that makes any sense. glibc
has this complicated mechanism with RPATH and RUNPATH that controls
whether RPATH is processed before or after LD_LIBRARY_PATH, presumably
to support legacy binaries, but there is no compelling reason to
support this, and better behavior is obtained by just fixing the
search order.
previously, errno could be meaningless when the caller wrote it to the
dlerror string or stderr. try to make it meaningful. also, fix
incorrect check for over-long program headers and instead actually
support them by allocating memory if needed.
the access function cannot be used to check for existence, because it
operates using real uid/gid rather than effective to determine
accessibility; this matters for the non-final path components.
instead, use stat. failure of stat is success if only the final
component is missing (ENOENT) and otherwise is failure.
the concept of both versions is the same; they differ only in details.
for long runs, they use "rep movsl" or "rep movsq", and for small
runs, they use a trick, writing from both ends towards the middle,
that reduces the number of branches needed. in addition, if memset is
called multiple times with the same length, all branches will be
predicted; there are no loops.
for larger runs, there are likely faster approaches than "rep", at
least on some cpu models. for 32-bit, it's unlikely that there is any
faster approach that does not require non-baseline instructions; doing
anything fancier would require inspecting cpu capabilities. for
64-bit, there may very well be faster versions that work on all
models; further optimization could be explored in the future.
with these changes, memset is anywhere between 50% faster and 6 times
faster, depending on the cpu model and the length and alignment of the
destination buffer.
the original motivation for this patch was that qemu (and possibly
other syscall emulators) nop out madvise, resulting in an infinite
loop. however, there is another benefit to this change: madvise may
actually undo an explicit madvise the application intended for its
stack, whereas the mremap operation is a true nop. the logic here is
that mremap must fail if it cannot resize the mapping in-place, and
the caller knows that it cannot resize in-place because it knows the
next page of virtual memory is already occupied.
one of the arguments to memcmp may be shorter than the length l-3, and
memcmp is under no obligation not to access past the first byte that
differs. instead use strncmp which conveys the correct semantics. the
performance difference is negligible here and since the code is only
use for shared libc, both functions are already linked anyway.
the dev/inode for the main app and the dynamic linker ("interpreter")
are not available, so the subsequent checks don't work. in general we
don't want to make exact string matches to existing libraries prevent
loading new ones, since this breaks loading upgraded modules in
module-loading systems. so instead, special-case it.
the motivation for this fix is that calling dlopen on the names
returned by dl_iterate_phdr or walking the link map (obtained by
dlinfo) seem to be the only methods available to an application to
actually get a list of open dso handles.
reject elf files which are not ET_EXEC/ET_DYN type as bad exec format,
and reject ET_EXEC files when they cannot be loaded at the correct
address, since they are not relocatable at runtime. the main practical
benefit of this is to make dlopen of the main program fail rather than
producing an unsafe-to-use handle.
it's not clear to me why the linker even outputs these headers if they
are null, but apparently it does so. with the default startfiles, they
will never be null anyway, but this patch allows eliminating crti,
crtn, crtbegin, and crtend (leaving only crt1) if the toolchain is
using init_array/fini_array (or for a C-only, no-ctor environment).
in signal() it is needed since __sigaction uses restrict in parameters
and sharing the buffer is technically an aliasing error. do the same
for the syscall, as at least qemu-user does not handle it properly.
LC_GLOBAL_LOCALE refers to the global locale, controlled by setlocale,
not the thread-local locale in effect which these functions should be
using. neither LC_GLOBAL_LOCALE nor 0 has an argument to the *_l
functions has behavior defined by the standard, but 0 is a more
logical choice for requesting the callee to lookup the current locale.
in the future I may move the current locale lookup the the caller (the
non-_l-suffixed wrapper).
at this point, all of the locale logic is dummied out, so no harm was
done, but it should at least avoid misleading usage.
also add a warning to the existing sys/poll.h. the warning is absent
from sys/dir.h because it is actually providing a slightly different
API to the program, and thus just replacing the #include directive is
not a valid fix to programs using this one.
entry point was wrong for PIE. e_entry was being treated as an
absolute value, whereas it's actually relative to the load address
(which is zero for non-PIE).
phdr pointer was wrong for non-PIE. e_phoff was being treated as
load-address-relative, whereas it's actually a file offset in the ELF
file. in any case, map_library was already computing it correctly, and
the incorrect code in __dynlink was overwriting it with junk.
the only immediate effect of this commit is enabling PIE support on
some archs that did not previously have any Scrt1.s, since the
existing asm files for crt1 override this C code. so some of the
crt_arch.h files committed are only there for the sake of documenting
what their archs "would do" if they used the new C-based crt1.
the expectation is that new archs should use this new system rather
than using heavy asm for crt1. aside from being easier and less
error-prone, it also ensures that PIE support is available immediately
(since Scrt1.o is generated from the same C source, using -fPIC)
rather than having to be added as an afterthought in the porting
process.
based on a patch by orc. POSIX actually fails to specify the format of
the ntop conversion; presumably, any output that will correctly
round-trip back via the (well-specified) pton operation is acceptable.
the new behavior is much more convenient than the old, however.
this patch also affects getnameinfo, which is implemented in terms of
inet_ntop and which is the preferred interface for performing this
conversion.
I've also removed some inexplicable cruft (filling the buffer with 'x'
before doing anything) whose origin I was unable to track down.
it's not clear that -O3 helps them, and gcc seems to have floating
point optimization bugs that introduce additional failures when -O3 is
used on some of these files.
apparently the original kernel commit's i386 version of siginfo.h
defined this field as unsigned int, but the asm-generic file always
had void *. unsigned int is obviously not a suitable type for an
address, in a non-arch-specific file, and glibc also has void * here,
so I think void * is the right type for it.
also fix redundant type specifiers.
linux commit 8d36eb01da5d371feffa280e501377b5c450f5a5 (2013-05-29)
added PF_IB for InfiniBand
linux commit d021c344051af91f42c5ba9fdedc176740cbd238 (2013-02-06)
added PF_VSOCK for VMware sockets
linux commit a0727e8ce513fe6890416da960181ceb10fbfae6 (2012-04-12)
added siginfo fields for SIGSYS (seccomp uses it)
linux commit ad5fa913991e9e0f122b021e882b0d50051fbdbc (2009-09-16)
added siginfo field and si_code values for SIGBUS (hwpoison signal)