Commit Graph

494 Commits

Author SHA1 Message Date
Szabolcs Nagy 789ff6a9f8 add MCL_ONFAULT and MLOCK_ONFAULT mlockall and mlock2 flags
they lock faulted pages into memory (useful when a small part of a
large mapped file needs efficient access), new in linux v4.4, commit
b0f205c2a3082dd9081f9a94e50658c5fa906ff1

MLOCK_* is not in the POSIX reserved namespace for sys/mman.h
2016-01-26 18:31:05 -05:00
Szabolcs Nagy 51d5f139ca add mlock2 syscall number from linux v4.4
this is mlock with a flags argument, new in linux commit
a8ca5d0ecbdde5cc3d7accacbd69968b0c98764e

as usual microblaze and sh don't have allocated syscall number yet.
2016-01-26 18:30:50 -05:00
Szabolcs Nagy 09001a8f97 add new membarrier, userfaultfd and switch_endian syscalls
new in linux v4.3 added for aarch64, arm, i386, mips, or1k, powerpc,
x32 and x86_64.

membarrier is a system wide memory barrier, moves most of the
synchronization cost to one side, new in kernel commit
5b25b13ab08f616efd566347d809b4ece54570d1

userfaultfd is useful for qemu and is new in kernel commit
8d2afd96c20316d112e04d935d9e09150e988397

switch_endian is powerpc only for switching endianness, new in commit
529d235a0e190ded1d21ccc80a73e625ebcad09b
2016-01-26 18:28:20 -05:00
Szabolcs Nagy 37bfb68f68 add new i386 socket syscall numbers
new in linux v4.3 commit 9dea5dc921b5f4045a18c63eb92e84dc274d17eb
direct calls instead of socketcall allow better seccomp filtering.

musl continues to use socketcalls internally on i386. (older kernels
would need a fallback mechanism if the direct calls were used.)
2016-01-26 18:28:04 -05:00
Szabolcs Nagy a5e133bf29 change the internal socketcall selection logic
only use SYS_socketcall if SYSCALL_USE_SOCKETCALL is defined
internally, otherwise use direct syscalls.

this commit does not change the current behaviour, it is
preparation for adding direct syscall numbers for i386.
2016-01-26 18:27:44 -05:00
Rich Felker e7a1118984 fix arm a_crash for big endian
contrary to commit 89e149d275, big
endian arm does need the instruction bytes in big endian order. rather
than trying to use a special encoding that works as arm or thumb,
simply encode the simplest/canonical undefined instructions dependent
on whether __thumb__ is defined.
2016-01-25 21:59:55 +00:00
Rich Felker 89e149d275 add native a_crash primitive for arm
the .byte directive encodes a guaranteed-undefined instruction, the
same one Linux fills the kuser helper page with when it's disabled.
the udf mnemonic and and .insn directives are not supported by old
binutils versions, and larger-than-byte integer directives would
produce the wrong output on big-endian.
2016-01-25 02:44:56 +00:00
Szabolcs Nagy bc443c3fe3 clean powerpc syscall.h
remove ifdefs for powerpc64.
2016-01-24 19:08:57 -05:00
Szabolcs Nagy f9c3a2e048 add missing powerpc specific PROT_SAO memory protection flag
this flag for strong access ordering was added in linux v2.6.27
commit aba46c5027cb59d98052231b36efcbbde9c77a1d
2016-01-24 19:08:40 -05:00
Szabolcs Nagy 2f6f3dccb4 fix powerpc MCL_* mlockall flags in bits/mman.h
the definitions didn't match the linux uapi headers.
2016-01-24 19:08:19 -05:00
Szabolcs Nagy 2d14fa39b0 fix aarch64 atomics to load/store 32bit only
a_ll/a_sc inline asm used 64bit register operands (%0) instead of 32bit
ones (%w0), this at least broke a_and_64 (which always cleared the top
32bit, leaking memory in malloc).
2016-01-24 19:07:35 -05:00
Rich Felker b17fbd3520 improve aarch64 atomics
aarch64 provides ll/sc variants with acquire/release memory order,
freeing us from the need to have full barriers both before and after
the ll/sc operation. previously they were not used because the a_cas
can fail without performing a_sc, in which case half of the barrier
would be omitted. instead, define a custom version of a_cas for
aarch64 which uses a_barrier explicitly when aborting the cas
operation. aside from cas, other operations built on top of ll/sc are
not affected since they never abort but rather loop until they
succeed.

a split ll/sc version of the pointer-sized a_cas_p is also introduced
using the same technique.

patch by Szabolcs Nagy.
2016-01-23 14:03:40 -05:00
Rich Felker 4de1bc1164 remove sh port's __fpscr_values source file
commit f3ddd17380, the dynamic linker
bootstrap overhaul, silently disabled the definition of __fpscr_values
in this file since libc.so's copy of __fpscr_values now comes from
crt_arch.h, the same place the public definition in the main program's
crt1.o ultimately comes from. remove this file which is no longer in
use.
2016-01-22 03:50:58 +00:00
Rich Felker 007907a93c move sh port's __shcall internal function from arch/sh/src to src tree 2016-01-22 03:50:08 +00:00
Rich Felker 230bfe1a7d move sh __unmapself code from arch/sh/src to main src tree 2016-01-22 03:46:00 +00:00
Rich Felker 66215afc2e move x32 sysinfo impl and syscall fixup code out of arch/x32/src
all such arch-specific translation units are being moved to
appropriate arch dirs under the main src tree.
2016-01-22 03:39:07 +00:00
Rich Felker 513c043694 overhaul powerpc atomics for new atomics framework
previously powerpc had a_cas defined in terms of its native ll/sc
style operations, but all other atomics were defined in terms of
a_cas. instead define a_ll and a_sc so the compiler can generate
optimized versions of all the atomic ops and perform better inlining
of a_cas.

extracting the result of the sc (stwcx.) instruction is rather awkward
because it's natively stored in a condition flag, which is not
representable in inline asm. but even with this limitation the new
code still seems significantly better.
2016-01-22 02:58:32 +00:00
Rich Felker 16b55298dc clean up x86_64 (and x32) atomics for new atomics framework
this commit mostly makes consistent things like spacing, function
ordering in atomic_arch.h, argument names, use of volatile, etc.
a_ctz_l was also removed from x86_64 since atomic.h provides it
automatically using a_ctz_64.
2016-01-22 00:53:09 +00:00
Rich Felker e24984efd5 clean up i386 atomics for new atomics framework
this commit mostly makes consistent things like spacing, function
ordering in atomic_arch.h, argument names, use of volatile, etc. the
fake 64-bit and/or atomics are also removed because the shared
atomic.h does a better job of implementing them; it avoids making two
atomic memory accesses when only one 32-bit half needs to be touched.

no major overhaul is needed or possible because x86 actually has
native versions of all the usual atomic operations, rather than using
ll/sc or needing cas loops.
2016-01-22 00:16:53 +00:00
Rich Felker 369b22f9c4 overhaul mips atomics for new atomics framework 2016-01-22 00:10:40 +00:00
Rich Felker e617b9eea9 move arm-specific translation units out of arch/arm/src, to src/*/arm
this is possible with the new build system that allows src/*/$(ARCH)/*
files which do not shadow a file in the parent directory, and yields a
more logical organization. eventually it will be possible to remove
arch/*/src from the build system.
2016-01-22 00:02:21 +00:00
Rich Felker 397f0a6a7d overhaul arm atomics for new atomics framework
switch to ll/sc model so that new atomic.h can provide optimized
versions of all the atomic primitives without needing an ll/sc loop
written in asm for each one.

all isa levels which use ldrex/strex now use the inline ll/sc model
even if the type of barrier to use is not known until runtime (v6).
the cas model is only used for arm v5 and earlier, and it has been
optimized to make the call via inline asm with custom constraints
rather than as a C function call.
2016-01-21 23:30:30 +00:00
Rich Felker aa0db4b5d0 overhaul aarch64 atomics for new atomics framework 2016-01-21 19:50:55 +00:00
Rich Felker 61b1e75f7d overhaul sh atomics for new atomics framework, add j-core cas.l backend
sh needs runtime-selected atomic backends since there are a number of
supported models that use non-forwards-compatible (non-smp-compatible)
atomic mechanisms. previously, the code paths for this were highly
inefficient since they involved C function calls with multiple
branches in the callee and heavy spills in the caller. the new code
performs calls the runtime-selected asm fragment from inline asm with
extremely minimal clobbers, rather than using a function call.

for the sh4a case where the atomic mechanism is known and there is no
forward-compatibility issue, the movli.l and movco.l instructions are
provided as a_ll and a_sc, allowing the new shared atomic.h to
generate efficient inline versions of all the basic atomic operations
without needing a cas loop.
2016-01-21 19:43:04 +00:00
Rich Felker 1315596b51 refactor internal atomic.h
rather than having each arch provide its own atomic.h, there is a new
shared atomic.h in src/internal which pulls arch-specific definitions
from arc/$(ARCH)/atomic_arch.h. the latter can be extremely minimal,
defining only a_cas or new ll/sc type primitives which the shared
atomic.h will use to construct everything else.

this commit avoids making heavy changes to the individual archs'
atomic implementations. definitions which are identical or
near-identical to what the new shared atomic.h would produce have been
removed, but otherwise the changes made are just hooking up the
arch-specific files to the new infrastructure. major changes to take
advantage of the new system will come in subsequent commits.
2016-01-21 19:08:54 +00:00
Rich Felker b6363bb70a fix build regression for arm pre-v7 from out-of-tree build patch
commit 2f853dd6b9 failed to replicate
the old makefile logic that caused arch/arm/src/arm/atomics.s to be
built. since this was the only .s file under arch/*/src, rather than
trying to reproduce the old logic, I'm just moving it up a level and
adjusting the glob pattern in the makefile to catch it. eventually
arch/*/src will probably be removed in favor of moving all these files
to appropriate src/*/$(ARCH) locations.
2016-01-20 02:31:06 +00:00
Rich Felker 56764601af fix dynamic linker path file selection for arm vs armhf
the __SOFTFP__ macro which was wrongly being used does not reflect the
ABI (arm vs armhf) but just the availability of floating point
instructions/registers, so -mfloat-abi=softfp was wrongly being
treated as armhf. __ARM_PCS_VFP is the correct predefined macro to
check for the armhf EABI variant. this macro usage was corrected for
the build process in commit 4918c2bb20
but reloc.h was apparently overlooked at the time.
2016-01-20 01:16:09 +00:00
Rich Felker 5e396fb996 adjust mips crt_arch entry point asm to avoid assembler bugs
apparently the .gpword directive does not work reliably with local
text labels; values produced were offset by 64k from the correct
value, resulting in incorrect computation of the got pointer at
runtime. instead, use an external label so that the assembler does not
munge the relocation; the linker will then get it right.

commit 6fef8cafbd exposed this issue by
removing the old, non-PIE-compatible handwritten crt1.s, which was not
affected. presumably mips PIE executables (using Scrt1.o produced from
crt_arch.h) were already affected at the time.
2015-12-29 13:01:29 -05:00
Rich Felker 71991a803c adjust i386 max_align_t definition to work around some broken compilers
at least gcc 4.7 claims c++11 support but does not accept the alignas
keyword, causing breakage when stddef.h is included in c++11 mode.
instead, prefer using __attribute__((__aligned__)) on any compiler
with GNU extensions, and only use the alignas keyword as a fallback
for other C++ compilers.

C code should not be affected by this patch.
2015-12-29 12:46:15 -05:00
Rich Felker 0d58bf2d60 remove visibility suppression by SHARED macro in mips and x32 arch files
commit 8a8fdf6398 was intended to remove
all such usage, but these arch-specific files were overlooked, leading
to inconsistent declarations and definitions.
2015-12-15 23:18:38 -05:00
Rich Felker 9439ebd766 fix dynamic loader library mapping for nommu systems
on linux/nommu, non-writable private mappings of files may actually
use memory shared with other processes or the fs cache. the old nommu
loader code (used when mmap with MAP_FIXED fails) simply wrote over
top of the original file mapping, possibly clobbering this shared
memory. no such breakage was observed in practice, but it should have
been possible.

the new code starts by mapping anonymous writable memory on archs that
might support nommu, then maps load segments over top of it, falling
back to read if MAP_FIXED fails. we use an anonymous map rather than a
writable file map to avoid reading more data from disk than needed.
since pages cannot be loaded lazily on fault, in case of large
data/bss, mapping the full file may read a lot of data that will
subsequently be thrown away when processing additional LOAD segments.
as a result, we cannot skip the first LOAD segment when operating in
this mode.

these changes affect only non-FDPIC nommu support.
2015-11-11 17:40:27 -05:00
Rich Felker 4e73d12117 explicitly assemble all arm asm sources as UAL
these files are all accepted as legacy arm syntax when producing arm
code, but legacy syntax cannot be used for producing thumb2 with
access to the full ISA. even after switching to UAL, some asm source
files contain instructions which are not valid in thumb mode, so these
will need to be addressed separately.
2015-11-10 00:01:55 -05:00
Rich Felker 9f290a49bf remove non-working pre-armv4t support from arm asm
the idea of the three-instruction sequence being removed was to be
able to return to thumb code when used on armv4t+ from a thumb caller,
but also to be able to run on armv4 without the bx instruction
available (in which case the low bit of lr would always be 0).
however, without compiler support for generating such a sequence from
C code, which does not exist and which there is unlikely to be
interest in implementing, there is little point in having it in the
asm, and it would likely be easier to add pre-armv4t support via
enhanced linker handling of R_ARM_V4BX than at the compiler level.

removing this code simplifies adding support for building libc in
thumb2-only form (for cortex-m).
2015-11-09 22:36:38 -05:00
Rich Felker 4fcb48275a generalize sh entry point asm not to assume call dests fit in 12 bits
this assumption is borderline-unsafe to begin with, and fails badly
with -ffunction-sections since the linker can move the callee
arbitrarily far away when it lies in a different section.
2015-11-02 18:11:36 -05:00
Rich Felker cb1bf2f321 properly access mcontext_t program counter in cancellation handler
using the actual mcontext_t definition rather than an overlaid pointer
array both improves correctness/readability and eliminates some ugly
hacks for archs with 64-bit registers bit 32-bit program counter.

also fix UB due to comparison of pointers not in a common array
object.
2015-11-02 12:41:49 -05:00
Rich Felker 92637bb0d8 prevent reordering of or1k and powerpc thread pointer loads
other archs use asm for the thread pointer load, so making that asm
volatile is sufficient to inform the compiler that it has a "side
effect" (crashing or giving the wrong result if the thread pointer was
not yet initialized) that prevents reordering. however, powerpc and
or1k have dedicated general purpose registers for the thread pointer
and did not need to use any asm to access it; instead, "local register
variables with a specified register" were used. however, there is no
specification for ordering constraints on this type of usage, and
presumably use of the thread pointer could be reordered across its
initialization.

to impose an ordering, I have added empty volatile asm blocks that
produce the "local register variable with a specified register" as
an output constraint.
2015-10-15 12:08:51 -04:00
Rich Felker 74483c5955 mark arm thread-pointer-loading inline asm as volatile
this builds on commits a603a75a72 and
0ba35d69c0 to ensure that a compiler
cannot conclude that it's valid to reorder the asm to a point before
the thread pointer is set up, or to treat the inline function as if it
were declared with attribute((const)).

other archs already use volatile asm for thread pointer loading.
2015-10-15 12:04:48 -04:00
Rich Felker 11da520c7a add comment documenting hard-coded opcode for reading mips thread pointer 2015-10-15 00:55:41 -04:00
Rich Felker 0ba35d69c0 remove attribute((const)) from arm __pthread_self inline function
commit a603a75a72 did this for the
public pthread_self function but not the internal inline one.
2015-10-15 00:20:50 -04:00
Rich Felker b61df2294f fix signal return for sh/fdpic
the restorer function pointer provided in the kernel sigaction
structure is interpreted by the kernel as a raw code address, not a
function descriptor.

this commit moves the declarations of the __restore and __restore_rt
symbols to ksigaction.h so that arch versions of the file can override
them, and introduces a version for sh which declares them as objects
rather than functions.

an alternate solution would have been defining SA_RESTORER to 0 so
that the functions are not used, but this both requires executable
stack (since the sh kernel does not have a vdso page with permanent
restorer functions) and crashes on qemu user-level emulation.
2015-09-23 18:33:49 +00:00
Rich Felker e9e770dfd6 have sh/fdpic entry point set fdpic personality if needed
the entry point code supports being loaded by a loader which is not
fdpic-aware (in practice, either kernel with mmu or qemu without fdpic
support). this mostly just works, but signal handling will wrongly use
a function descriptor address as a code address if the personality is
not adjusted to fdpic.

ideally this code could be placed with sigaction so that it's not
needed except if/when a signal handler is installed. however,
personality is incorrectly maintained per-thread by the kernel, rather
than per-process, so it's necessary to correct the personality before
any threads are started. also, in order to skip the personality
syscall when an fdpic-aware loader is used, we need to be able to
detect how the program was loaded, and this information is only
readily available at the entry point.
2015-09-22 20:51:59 +00:00
Rich Felker eaf7ab6e24 add real fdpic loading of shared libraries
previously, the normal ELF library loading code was used even for
fdpic, so only the kernel-loaded dynamic linker and main app could
benefit from separate placement of segments and shared text.
2015-09-22 19:12:48 +00:00
Rich Felker 7f9086df95 size-optimize sh/fdpic dynamic entry point
the __fdpic_fixup code is not needed for ET_DYN executables, which
instead use reloctions, so we can omit it from the dynamic linker and
static-pie entry point and save some code size.
2015-09-22 04:14:07 +00:00
Rich Felker cab2b1f9d7 work around breakage in sh/fdpic __unmapself function
the C implementation of __unmapself used for potentially-nommu sh
assumed CRTJMP takes a function descriptor rather than a code address;
however, the actual dynamic linker needs a code address, and so commit
7a9669e977 changed the definition of the
macro in reloc.h. this commit puts the old macro back in a place where
it only affects __unmapself.

this is an ugly workaround and should be cleaned up at some point, but
at least it's well isolated.
2015-09-22 04:10:42 +00:00
Rich Felker 7a9669e977 add general fdpic support in dynamic linker and arch support for sh
at this point not all functionality is complete. the dynamic linker
itself, and main app if it is also loaded by the kernel, take
advantage of fdpic and do not need constant displacement between
segments, but additional libraries loaded by the dynamic linker follow
normal ELF semantics for mapping still. this fully works, but does not
admit shared text on nommu.

in terms of actual functional correctness, dlsym's results are
presently incorrect for function symbols, RTLD_NEXT fails to identify
the caller correctly, and dladdr fails almost entirely.

with the dynamic linker entry point working, support for static pie is
automatically included, but linking the main application as ET_DYN
(pie) probably does not make sense for fdpic anyway. ET_EXEC is
equally relocatable but more efficient at representing relocations.
2015-09-22 03:54:42 +00:00
Rich Felker 12b0b7d8ea new dlstart stage-2 chaining for x86_64 and x32 2015-09-17 07:28:44 +00:00
Rich Felker c16182680c new dlstart stage-2 chaining for powerpc 2015-09-17 07:20:58 +00:00
Rich Felker 4761e63bc4 new dlstart stage-2 chaining for or1k 2015-09-17 07:20:51 +00:00
Rich Felker cd7159e7be new dlstart stage-2 chaining for mips 2015-09-17 07:20:43 +00:00
Rich Felker 57e2dce7e4 new dlstart stage-2 chaining for microblaze 2015-09-17 07:20:36 +00:00
Rich Felker 2907afb8db introduce new symbol-lookup-free rcrt1/dlstart stage chaining
previously, the call into stage 2 was made by looking up the symbol
name "__dls2" (which was chosen short to be easy to look up) from the
dynamic symbol table. this was no problem for the dynamic linker,
since it always exports all its symbols. in the case of the static pie
entry point, however, the dynamic symbol table does not contain the
necessary symbol unless -rdynamic/-E was used when linking. this
linking requirement is a major obstacle both to practical use of
static-pie as a nommu binary format (since it greatly enlarges the
file) and to upstream toolchain support for static-pie (adding -E to
default linking specs is not reasonable).

this patch replaces the runtime symbolic lookup with a link-time
lookup via an inline asm fragment, which reloc.h is responsible for
providing. in this initial commit, the asm is provided only for i386,
and the old lookup code is left in place as a fallback for archs that
have not yet transitioned.

modifying crt_arch.h to pass the stage-2 function pointer as an
argument was considered as an alternative, but such an approach would
not be compatible with fdpic, where it's impossible to compute
function pointers without already having performed relocations. it was
also deemed desirable to keep crt_arch.h as simple/minimal as
possible.

in principle, archs with pc-relative or got-relative addressing of
static variables could instead load the stage-2 function pointer from
a static volatile object. that does not work for fdpic, and is not
safe against reordering on mips-like archs that use got slots even for
static functions, but it's a valid on i386 and many others, and could
provide a reasonable default implementation in the future.
2015-09-17 06:30:55 +00:00
Felix Janda 64b6684ddd reindent powerpc's bits/termios.h to be consistent with other archs 2015-09-15 14:30:08 -04:00
Felix Janda b291e7ca9b fix namespace violations in aarch64/bits/termios.h
in analogy with commit a627eb3586
2015-09-15 14:28:07 -04:00
Rich Felker d4c82d05b8 add sh fdpic subarch variants
with this commit it should be possible to produce a working
static-linked fdpic libc and application binaries for sh.

the changes in reloc.h are largely unused at this point since dynamic
linking is not supported, but the CRTJMP macro is used one place
outside of dynamic linking, in __unmapself.
2015-09-12 03:23:49 +00:00
Rich Felker 4ccc1a01e0 add fdpic version of entry point code for sh
this version of the entry point is only suitable for static linking in
ET_EXEC form. neither dynamic linking nor pie is supported yet. at
some point in the future the fdpic and non-fdpic versions of this code
may be unified but for now it's easiest to work with them separately.
2015-09-12 03:18:08 +00:00
Rich Felker 234c58467c make sh clone asm fdpic-compatible
clone calls back to a function pointer provided by the caller, which
will actually be a pointer to a function descriptor on fdpic. the
obvious solution is to have a separate version of clone for fdpic, but
I have taken a simpler approach to go around the problem. instead of
calling the pointed-to function from asm, a direct call is made to an
internal C function which then calls the pointed-to function. this
lets the C compiler generate the appropriate calling convention for an
indirect call with no need for ABI-specific assembly.
2015-09-12 02:55:28 +00:00
Rich Felker 878887c50c fix missing earlyclobber flag in i386 a_ctz_64 asm
this error was only found by reading the code, but it seems to have
been causing gcc to produce wrong code in malloc: the same register
was used for the output and the high word of the input. in principle
this could have caused an infinite loop searching for an available
bin, but in practice most x86 models seem to implement the "undefined"
result of the bsf instruction as "unchanged".
2015-09-09 07:18:28 +00:00
Timo Teräs d8be1bc019 implement arm eabi mem* functions
these functions are part of the ARM EABI, meaning compilers may
generate references to them. known versions of gcc do not use them,
but llvm does. they are not provided by libgcc, and the de facto
standard seems to be that libc provides them.
2015-08-31 06:35:01 +00:00
Rich Felker 5a9c8c05a5 mitigate performance regression in libc-internal locks on x86_64
commit 3c43c0761e fixed missing
synchronization in the atomic store operation for i386 and x86_64, but
opted to use mfence for the barrier on x86_64 where it's always
available. however, in practice mfence is significantly slower than
the barrier approach used on i386 (a nop-like lock orl operation).
this commit changes x86_64 (and x32) to use the faster barrier.
2015-08-16 18:15:18 +00:00
Szabolcs Nagy e5b086e1d5 aarch64: fix 64-bit syscall argument passing
On 32bit systems long long arguments are passed in a special way
to some syscalls; this accidentally got copied to the AArch64 port.

The following interfaces were broken: fallocate, fanotify, ftruncate,
posix_fadvise, posix_fallocate, pread, pwrite, readahead,
sync_file_range, truncate.
2015-08-11 23:11:57 +00:00
Rich Felker 3c43c0761e fix missing synchronization in atomic store on i386 and x86_64
despite being strongly ordered, the x86 memory model does not preclude
reordering of loads across earlier stores. while a plain store
suffices as a release barrier, we actually need a full barrier, since
users of a_store subsequently load a waiter count to determine whether
to issue a futex wait, and using a stale count will result in soft
(fail-to-wake) deadlocks. these deadlocks were observed in malloc and
possible with stdio locks and other libc-internal locking.

on i386, an atomic operation on the caller's stack is used as the
barrier rather than performing the store itself using xchg; this
avoids the need to read the cache line on which the store is being
performed. mfence is used on x86_64 where it's always available, and
could be used on i386 with the appropriate cpu model checks if it's
shown to perform better.
2015-07-28 18:40:18 +00:00
Roman Yeryomin 3975577922 socket.h: cleanup/reorder mips and powerpc bits/socket.h
....to be somewhat consistent and easily comparable with asm/socket.h

Signed-off-by: Roman Yeryomin <roman@ubnt.com>
2015-07-21 19:14:58 -04:00
Roman Yeryomin 29ec7677a7 socket.h: fix SO_* for mips
Signed-off-by: Roman Yeryomin <roman@ubnt.com>
2015-07-21 19:14:26 -04:00
Felix Fietkau 3fffa7a658 mips: fix mcontext_t register array field name
glibc and uclibc use gregs instead of regs

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
2015-07-21 19:02:31 -04:00
Rich Felker 6ba5517a46 fix local-dynamic model TLS on mips and powerpc
the TLS ABI spec for mips, powerpc, and some other (presently
unsupported) RISC archs has the return value of __tls_get_addr offset
by +0x8000 and the result of DTPOFF relocations offset by -0x8000. I
had previously assumed this part of the ABI was actually just an
implementation detail, since the adjustments cancel out. however, when
the local dynamic model is used for accessing TLS that's known to be
in the same DSO, either of the following may happen:

1. the -0x8000 offset may already be applied to the argument structure
passed to __tls_get_addr at ld time, without any opportunity for
runtime relocations.

2. __tls_get_addr may be used with a zero offset argument to obtain a
base address for the module's TLS, to which the caller then applies
immediate offsets for individual objects accessed using the local
dynamic model. since the immediate offsets have the -0x8000 adjustment
applied to them, the base address they use needs to include the
+0x8000 offset.

it would be possible, but more complex, to store the pointers in the
dtv[] array with the +0x8000 offset pre-applied, to avoid the runtime
cost of adding 0x8000 on each call to __tls_get_addr. this change
could be made later if measurements show that it would help.
2015-06-25 22:22:00 +00:00
Rich Felker 10d0268ccf switch to using trap number 31 for syscalls on sh
nominally the low bits of the trap number on sh are the number of
syscall arguments, but they have never been used by the kernel, and
some code making syscalls does not even know the number of arguments
and needs to pass an arbitrary high number anyway.

sh3/sh4 traditionally used the trap range 16-31 for syscalls, but part
of this range overlapped with hardware exceptions/interrupts on sh2
hardware, so an incompatible range 32-47 was chosen for sh2.

using trap number 31 everywhere, since it's in the existing sh3/sh4
range and does not conflict with sh2 hardware, is a proposed
unification of the kernel syscall convention that will allow binaries
to be shared between sh2 and sh3/sh4. if this is not accepted into the
kernel, we can refit the sh2 target with runtime selection mechanisms
for the trap number, but doing so would be invasive and would entail
non-trivial overhead.
2015-06-16 15:25:02 +00:00
Rich Felker 3366a99b17 switch sh port's __unmapself to generic version when running on sh2/nommu
due to the way the interrupt and syscall trap mechanism works,
userspace on sh2 must never set the stack pointer to an invalid value.
thus, the approach used on most archs, where __unmapself executes with
no stack for the interval between SYS_munmap and SYS_exit, is not
viable on sh2.

in order not to pessimize sh3/sh4, the sh asm version of __unmapself
is not removed. instead it's renamed and redirected through code that
calls either the generic (safe) __unmapself or the sh3/sh4 asm,
depending on compile-time and run-time conditions.
2015-06-16 14:55:06 +00:00
Rich Felker f9d84554ba add support for sh2 interrupt-masking-based atomics to sh port
the sh2 target is being considered an ISA subset of sh3/sh4, in the
sense that binaries built for sh2 are intended to be usable on later
cpu models/kernels with mmu support. so rather than hard-coding
sh2-specific atomics, the runtime atomic selection mechanisms that was
already in place has been extended to add sh2 atomics.

at this time, the sh2 atomics are not SMP-compatible; since the ISA
lacks actual atomic operations, the new code instead masks interrupts
for the duration of the atomic operation, producing an atomic result
on single-core. this is only possible because the kernel/hardware does
not impose protections against userspace doing so. additional changes
will be needed to support future SMP systems.

care has been taken to avoid producing significant additional code
size in the case where it's known at compile-time that the target is
not sh2 and does not need sh2-specific code.
2015-06-16 14:38:41 +00:00
Szabolcs Nagy ee59c296d5 arm: add vdso support
vdso will be available on arm in linux v4.2, the user-space code
for it is in kernel commit 8512287a8165592466cb9cb347ba94892e9c56a5
2015-06-14 04:23:20 +00:00
Rich Felker 9f26ebded1 fix stack alignment code in mips crt_arch.h
the instruction used to align the stack, "and $sp, $sp, -8", does not
actually exist; it's expanded to 2 instructions using the 'at'
(assembler temporary) register, and thus cannot be used in a branch
delay slot. since alignment mod 16 commutes with subtracting 8, simply
swapping these two operations fixes the problem.

crt1.o was not affected because it's still being generated from a
dedicated asm source file. dlstart.lo was not affected because the
stack pointer it receives is already aligned by the kernel. but
Scrt1.o was affected in cases where the dynamic linker gave it a
misaligned stack pointer.
2015-05-24 23:03:47 -04:00
Rich Felker 63caf1d207 add .text section directive to all crt_arch.h files missing it
i386 and x86_64 versions already had the .text directive; other archs
did not. normally, top-level (file scope) __asm__ starts in the .text
section anyway, but problems were reported with some versions of
clang, and it seems preferable to set it explicitly anyway, at least
for the sake of consistency between archs.
2015-05-22 01:50:05 -04:00
Rich Felker c648cefb27 fix inconsistency in a_and and a_or argument types on x86[_64]
conceptually, and on other archs, these functions take a pointer to
int, but in the i386, x86_64, and x32 versions of atomic.h, they took
a pointer to void instead.
2015-05-20 00:17:35 -04:00
Bobby Bingham 390f93ef69 inline llsc atomics when building for sh4a
If we're building for sh4a, the compiler is already free to use
instructions only available on sh4a, so we can do the same and inline the
llsc atomics. If we're building for an older processor, we still do the
same runtime atomics selection as before.
2015-05-19 00:42:07 -04:00
Rich Felker c0f10cf067 make arm reloc.h CRTJMP macro compatible with thumb
compilers targeting armv7 may be configured to produce thumb2 code
instead of arm code by default, and in the future we may wish to
support targets where only the thumb instruction set is available.

the instructions this patch omits in thumb mode are needed only for
non-thumb versions of armv4 or earlier, which are not supported by any
current compilers/toolchains and thus rather pointless to have. at
some point these compatibility return sequences may be removed from
all asm source files, and in that case it would make sense to remove
them here too and remove the ifdef.
2015-05-14 18:51:27 -04:00
Rich Felker 83340c7a58 make arm crt_arch.h compatible with thumb code generation
compilers targeting armv7 may be configured to produce thumb2 code
instead of arm code by default, and in the future we may wish to
support targets where only the thumb instruction set is available.

the changes made here avoid operating directly on the sp register,
which is not possible in thumb code, and address an issue with the way
the address of _DYNAMIC is computed.

previously, the relative address of _DYNAMIC was stored with an
additional offset of -8 versus the pc-relative add instruction, since
on arm the pc register evaluates to ".+8". in thumb code, it instead
evaluates to ".+4". both are two (normal-size) instructions beyond "."
in the current execution mode, so the numbered label 2 used in the
relative address expression is simply moved two instructions ahead to
be compatible with both instruction sets.
2015-05-14 18:26:16 -04:00
Rich Felker 484194dbf4 fix stack protector crashes on x32 & powerpc due to misplaced TLS canary
i386, x86_64, x32, and powerpc all use TLS for stack protector canary
values in the default stack protector ABI, but the location only
matched the ABI on i386 and x86_64. on x32, the expected location for
the canary contained the tid, thus producing spurious mismatches
(resulting in process termination) upon fork. on powerpc, the expected
location contained the stdio_locks list head, so returning from a
function after calling flockfile produced spurious mismatches. in both
cases, the random canary was not present, and a predictable value was
used instead, making the stack protector hardening much less effective
than it should be.

in the current fix, the thread structure has been expanded to have
canary fields at all three possible locations, and archs that use a
non-default location must define a macro in pthread_arch.h to choose
which location is used. for most archs (which lack TLS canary ABI) the
choice does not matter.
2015-05-06 18:37:19 -04:00
Rich Felker 7fe273b2c1 fix broken cancellation on x32 due to incorrect saved-PC offset 2015-05-02 12:16:57 -04:00
Rich Felker 4f69594689 fix dangling pointers in x32 syscall timespec fixup code
the lifetime of compound literals is the block in which they appear.
the temporary struct __timespec_kernel objects created as compound
literals no longer existed at the time their addresses were passed to
the kernel.
2015-05-01 21:22:27 -04:00
Szabolcs Nagy 18f75b80fd fix __syscall declaration with wrong visibility in syscall_arch.h
remove __syscall declaration where it is not needed (aarch64, arm,
microblaze, or1k) and add the hidden attribute where it is (mips).
2015-04-30 16:22:57 -04:00
Szabolcs Nagy 4e50b2e4b5 aarch64: fix CRTJMP in reloc.h
commit f3ddd17380 broke the build by
using "bx" instead of "br".
2015-04-30 16:21:51 -04:00
Rich Felker 85d12e0285 fix sh jmp_buf size to match ABI
while the sh port is still experimental and subject to ABI
instability, this is not actually an application/libc boundary ABI
change. it only affects third-party APIs where jmp_buf is used in a
shared structure at the ABI boundary, because nothing anywhere near
the end of the jmp_buf object (which includes the oversized sigset_t)
is accessed by libc.

both glibc and uclibc have 15-slot jmp_buf for sh. presumably the
smaller version was used in musl because the slots for fpu status
register and thread pointer register (gbr) were incorrect and must not
be restored by longjmp, but the size should have been preserved, as
it's generally treated as a libc-agnostic ABI property for the arch,
and having extra slots free in case we ever need them for something is
useful anyway.
2015-04-27 20:03:28 -04:00
Rich Felker 1fb0878ebc fix ldso name for sh-nofpu subarch
previously it was using the same name as the default ABI with hard
float (floating point args and return value in registers).

the test __SH_FPU_ANY__ || __SH4__ matches what's used in the
configure script already, and seems correct under casual review
against gcc's config/sh.h, but may need tweaks. the logic for
predefined macros for sh, and what they all mean, is very complex.
eventually this should be documented in comments here.

configure already rejects "half-hard" configurations on sh where
double=float since these do not conform to Annex F and are not
suitable for musl, so these do not need to be considered here.
2015-04-24 13:05:21 -04:00
Rich Felker 7faee5fa0d fix failure of sh reloc.h to properly detect endianness for ldso name
versions of reloc.h that rely on endian macros much include endian.h
to ensure they are available.
2015-04-24 11:06:11 -04:00
Rich Felker 4bf10ebf66 fix breakage in x32 dynamic linker due to mismatching register size
the jmp instruction requires a 64-bit register, so cast the desired PC
address up to uint64_t, going through uintptr_t to ensure that it's
zero-extended rather than possibly sign-extended.
2015-04-20 18:17:48 -04:00
Szabolcs Nagy 87c62d06e4 add execveat syscall number to microblaze
syscall number was reserved in linux v4.0, kernel commit
add4b1b02da7e7ec35c34dd04d351ac53f3f0dd8
2015-04-17 22:31:20 -04:00
Rich Felker 19bcdeeb1e fix missing quotation mark in mips crt_arch.h that broke build 2015-04-17 22:21:15 -04:00
Rich Felker cbc02ba23c consistently use hidden visibility for cancellable syscall internals
in a few places, non-hidden symbols were referenced from asm in ways
that assumed ld-time binding. while these is no semantic reason these
symbols need to be hidden, fixing the references without making them
hidden was going to be ugly, and hidden reduces some bloat anyway.

in the asm files, .global/.hidden directives have been moved to the
top to unclutter the actual code.
2015-04-14 11:18:59 -04:00
Rich Felker da7ccf822c use hidden visibility for i386 asm-internal __vsyscall symbol
otherwise the call instruction in the inline syscall asm results in
textrels without ld-time binding.
2015-04-14 10:22:12 -04:00
Rich Felker f3ddd17380 dynamic linker bootstrap overhaul
this overhaul further reduces the amount of arch-specific code needed
by the dynamic linker and removes a number of assumptions, including:

- that symbolic function references inside libc are bound at link time
  via the linker option -Bsymbolic-functions.

- that libc functions used by the dynamic linker do not require
  access to data symbols.

- that static/internal function calls and data accesses can be made
  without performing any relocations, or that arch-specific startup
  code handled any such relocations needed.

removing these assumptions paves the way for allowing libc.so itself
to be built with stack protector (among other things), and is achieved
by a three-stage bootstrap process:

1. relative relocations are processed with a flat function.
2. symbolic relocations are processed with no external calls/data.
3. main program and dependency libs are processed with a
   fully-functional libc/ldso.

reduction in arch-specific code is achived through the following:

- crt_arch.h, used for generating crt1.o, now provides the entry point
  for the dynamic linker too.

- asm is no longer responsible for skipping the beginning of argv[]
  when ldso is invoked as a command.

- the functionality previously provided by __reloc_self for heavily
  GOT-dependent RISC archs is now the arch-agnostic stage-1.

- arch-specific relocation type codes are mapped directly as macros
  rather than via an inline translation function/switch statement.
2015-04-13 03:04:42 -04:00
Rich Felker 25748db301 fix possible clobbering of syscall return values on mips
depending on the compiler's interpretation of __asm__ register names
for register class objects, it may be possible for the return value in
r2 to be clobbered by the function call to __stat_fix. I have not
observed any such breakage in normal builds and suspect it only
happens with -O0 or other unusual build options, but since there's an
ambiguity as to the semantics of this feature, it's best to use an
explicit temporary to avoid the issue.

based on reporting and patch by Eugene.
2015-04-07 12:47:19 -04:00
Rich Felker fd427c4eae move O_PATH definition back to arch bits
while it's the same for all presently supported archs, it differs at
least on sparc, and conceptually it's no less arch-specific than the
other O_* macros. O_SEARCH and O_EXEC are still defined in terms of
O_PATH in the main fcntl.h.
2015-04-01 19:31:06 -04:00
Rich Felker abfe1f6541 aarch64: remove duplicate macro definitions in bits/fcntl.h 2015-04-01 19:25:32 -04:00
Rich Felker dfc1a37c44 aarch64: fix definition of sem_nsems in semid_ds structure
POSIX requires the sem_nsems member to have type unsigned short. we
have to work around the incorrect kernel type using matching
endian-specific padding.
2015-04-01 19:12:18 -04:00
Szabolcs Nagy b24d813d24 aarch64: fix namespace pollution in bits/shm.h
The shm_info struct is a gnu extension and some of its members do
not have shm* prefix. This is worked around in sys/shm.h by macros,
but aarch64 didn't use those.
2015-04-01 19:05:12 -04:00
Rich Felker e626deeec8 fix missing max_align_t definition on aarch64 2015-03-20 01:21:37 -04:00
Rich Felker d5a5045382 fix MINSIGSTKSZ values for archs with large signal contexts
the previous values (2k min and 8k default) were too small for some
archs. aarch64 reserves 4k in the signal context for future extensions
and requires about 4.5k total, and powerpc reportedly uses over 2k.
the new minimums are chosen to fit the saved context and also allow a
minimal signal handler to run.

since the default (SIGSTKSZ) has always been 6k larger than the
minimum, it is also increased to maintain the 6k usable by the signal
handler. this happens to be able to store one pathname buffer and
should be sufficient for calling any function in libc that doesn't
involve conversion between floating point and decimal representations.

x86 (both 32-bit and 64-bit variants) may also need a larger minimum
(around 2.5k) in the future to support avx-512, but the values on
these archs are left alone for now pending further analysis.

the value for PTHREAD_STACK_MIN is not increased to match MINSIGSTKSZ
at this time. this is so as not to preclude applications from using
extremely small thread stacks when they know they will not be handling
signals. unfortunately cancellation and multi-threaded set*id() use
signals as an implementation detail and therefore require a stack
large enough for a signal context, so applications which use extremely
small thread stacks may still need to avoid using these features.
2015-03-18 00:31:37 -04:00
Szabolcs Nagy 962cbfbf86 aarch64: fix typo in bits/ioctl.h 2015-03-14 15:49:08 -04:00
Szabolcs Nagy 38bf2d7cc3 aarch64: add struct _aarch64_ctx to signal.h
The unwind code in libgcc uses this type for unwinding across signal
handlers. On aarch64 the kernel may place a sequence of structs on the
signal stack on top of the ucontext to provide additional information.
The unwinder only needs the header, but added all the types the kernel
currently defines for this mechanism because they are part of the uapi.
2015-03-14 13:55:24 -04:00
Rich Felker 673cab5c56 align x32 pthread type sizes to be common with 32-bit archs
previously, commit e7b9887e8b aligned
the sizes with the glibc ABI. subsequent discussion during the merge
of the aarch64 port reached a conclusion that we should reject larger
arch-specific sizes, which have significant cost and no benefit, and
stick with the existing common 32-bit sizes for all 32-bit/ILP32 archs
and the x86_64 sizes for 64-bit archs.

one peculiarity of this change is that x32 pthread_attr_t is now
larger in musl than in the glibc x32 ABI, making it unsafe to call
pthread_attr_init from x32 code that was compiled against glibc. with
all the ABI issues of x32, it's not clear that ABI compatibility will
ever work, but if it's needed, pthread_attr_init and related functions
could be modified not to write to the last slot of the object.

this is not a regression versus previous releases, since on previous
releases the x32 pthread type sizes were all severely oversized
already (due to incorrectly using the x86_64 LP64 definitions).
moreover, x32 is still considered experimental and not ABI-stable.
2015-03-12 14:43:36 -04:00
Szabolcs Nagy 01ef3dd9c5 add aarch64 port
This adds complete aarch64 target support including bigendian subarch.

Some of the long double math functions are known to be broken otherwise
interfaces should be fully functional, but at this point consider this
port experimental.

Initial work on this port was done by Sireesh Tripurari and Kevin Bortis.
2015-03-11 20:12:35 -04:00