Commit Graph

13 Commits

Author SHA1 Message Date
Rich Felker 4a241f14a6 overhaul ARM atomics/tls for performance and compatibility
previously, builds for pre-armv6 targets hard-coded use of the "kuser
helper" system for atomics and thread-pointer access, resulting in
binaries that fail to run (crash) on systems where this functionality
has been disabled (as a security/hardening measure) in the kernel.
additionally, builds for armv6 hard-coded an outdated/deprecated
memory barrier instruction which may require emulation (extremely
slow) on future models.

this overhaul replaces the behavior for all pre-armv7 builds (both of
the above cases) to perform runtime detection of the appropriate
mechanisms for barrier, atomic compare-and-swap, and thread pointer
access. detection is based on information provided by the kernel in
auxv: presence of the HWCAP_TLS bit for AT_HWCAP and the architecture
version encoded in AT_PLATFORM. direct use of the instructions is
preferred when possible, since probing for the existence of the kuser
helper page would be difficult and would incur runtime cost.

for builds targeting armv7 or later, the runtime detection code is not
compiled at all, and much more efficient versions of the non-cas
atomic operations are provided by using ldrex/strex directly rather
than wrapping cas.
2014-11-19 01:02:01 -05:00
Rich Felker 867b1822f3 add explicit barrier operation to internal atomic.h API 2014-10-10 18:17:09 -04:00
Rich Felker 8b3d7d0d35 fix build error on arm due to new a_spin code
this was broken by commit ea818ea834.
2014-08-25 16:37:13 -04:00
Rich Felker ea818ea834 add working a_spin() atomic for non-x86 targets
conceptually, a_spin needs to be at least a compiler barrier, so the
compiler will not optimize out loops (and the load on each iteration)
while spinning. it should also be a memory barrier, or the spinning
thread might keep spinning without noticing stores from other threads,
thus delaying for longer than it should.

ideally, an optimal a_spin implementation that avoids unnecessary
cache/memory contention should be chosen for each arch, but for now,
the easiest thing is to perform a useless a_cas on the calling
thread's stack.
2014-08-25 15:43:40 -04:00
Rich Felker 90e51e45f5 clean up unused and inconsistent atomics in arch dirs
the a_cas_l, a_swap_l, a_swap_p, and a_store_l operations were
probably used a long time ago when only i386 and x86_64 were
supported. as other archs were added, support for them was
inconsistent, and they are obviously not in use at present. having
them around potentially confuses readers working on new ports, and the
type-punning hacks and inconsistent use of types in their definitions
is not a style I wish to perpetuate in the source tree, so removing
them seems appropriate.
2014-07-27 21:50:24 -04:00
Rich Felker e783efa6ef fix arm thread-pointer/atomic asm when compiling to thumb code
armv7/thumb2 provides a way to do atomics in thumb mode, but for armv6
we need a call to arm mode.

this commit is based on a patch by Stephen Thomas which fixed the
armv7 cases but not the armv6 ones.

all of this should be revisited if/when runtime selection of thread
pointer access and atomics are added.
2014-04-30 15:32:11 -04:00
Rich Felker 3933fdd500 use dmb barrier instruction for atomics on arm v7
aside from potentially offering better performance, this change is
needed since the old coprocessor-based approach to barriers is
deprecated in arm v7, and some compilers/assemblers issue errors when
using the deprecated instruction for v7 targets.
2014-04-14 23:41:49 -04:00
Rich Felker efe07b0f89 fix arm atomic asm register constraint
the "m" constraint could give a memory reference with an offset that's
not compatible with ldrex/strex, so the arm-specific "Q" constraint is
needed instead.
2014-04-07 04:28:12 -04:00
Rich Felker 1974bffa2d use inline atomics and thread pointer on arm models supporting them
this is perhaps not the optimal implementation; a_cas still compiles
to nested loops due to the different interface contracts of the kuser
helper cas function (whose contract this patch implements) and the
a_cas function (whose contract mimics the x86 cmpxchg). fixing this
may be possible, but it's more complicated and thus deferred until a
later time.

aside from improving performance and code size, this patch also
provides a means of producing binaries which can run on hardened
kernels where the kuser helpers have been disabled. however, at
present this requires producing binaries for armv6k or later, which
will not run on older cpus. a real solution to the problem of kernels
that omit the kuser helpers would be runtime detection, so that
universal binaries which run on all arm cpu models can also be
compatible with all kernel hardening profiles. robust detection
however is a much harder problem, and will be addressed at a later
time.
2014-04-07 04:03:18 -04:00
Rich Felker 35a6801c6c fix arm atomic store and generate simpler/less-bloated/faster code
atomic store was lacking a barrier, which was fine for legacy arm with
no real smp and kernel-emulated cas, but unsuitable for more modern
systems. the kernel provides another "kuser" function, at 0xffff0fa0,
which could be used for the barrier, but using that would drop support
for kernels 2.6.12 through 2.6.14 unless an extra conditional were
added to check for barrier availability. just using the barrier in the
kernel cas is easier, and, based on my reading of the assembly code in
the kernel, does not appear to be significantly slower.

at the same time, other atomic operations are adapted to call the
kernel cas function directly rather than using a_cas; due to small
differences in their interface contracts, this makes the generated
code much simpler.
2013-09-22 03:06:17 -04:00
Rich Felker 7568ee4cbf add missing a_or_l to atomic.h for non-x86 archs
this is needed for recently committed sigaction code
2013-08-11 03:43:25 -04:00
Rich Felker a3bdcd9376 remove little-endian assumption from arm atomic.h
this hidden endian dependency had left big endian arm badly broken.
2012-07-08 00:05:08 -04:00
Rich Felker d960d4f2cb initial commit of the arm port
this port assumes eabi calling conventions, eabi linux syscall
convention, and presence of the kernel helpers at 0xffff0f?0 needed
for threads support. otherwise it makes very few assumptions, and the
code should work even on armv4 without thumb support, as well as on
systems with thumb interworking. the bits headers declare this a
little endian system, but as far as i can tell the code should work
equally well on big endian.

some small details are probably broken; so far, testing has been
limited to qemu/aboriginal linux.
2011-09-18 16:44:54 -04:00