RepoMirrors/musl - musl

Commit Graph

Author	SHA1	Message	Date
Rich Felker	ea71a9004e	deduplicate TP_ADJ logic out of each arch, replace with TP_OFFSET the only part of TP_ADJ that was not uniquely determined by TLS_ABOVE_TP was the 0x7000 adjustment used mainly on mips and powerpc variants.	2020-08-24 22:04:52 -04:00
Rich Felker	a4a3e4dbc0	make thread-pointer-loading asm non-volatile this will allow the compiler to cache and reuse the result, meaning we no longer have to take care not to load it more than once for the sake of archs where the load may be expensive. depends on commit `1c84c99913` for correctness, since otherwise the compiler could hoist loads during stage 3 of dynamic linking before the initial thread-pointer setup.	2018-10-16 14:11:46 -04:00
Rich Felker	9b95fd0944	define and use internal macros for hidden visibility, weak refs this cleans up what had become widespread direct inline use of "GNU C" style attributes directly in the source, and lowers the barrier to increased use of hidden visibility, which will be useful to recovering some of the efficiency lost when the protected visibility hack was dropped in commit `dc2f368e56`, especially on archs where the PLT ABI is costly.	2018-09-05 14:05:14 -04:00
Szabolcs Nagy	610c5a8524	fix TLS layout of TLS variant I when there is a gap above TP In TLS variant I the TLS is above TP (or above a fixed offset from TP) but on some targets there is a reserved gap above TP before TLS starts. This matters for the local-exec tls access model when the offsets of TLS variables from the TP are hard coded by the linker into the executable, so the libc must compute these offsets the same way as the linker. The tls offset of the main module has to be alignup(GAP_ABOVE_TP, main_tls_align). If there is no TLS in the main module then the gap can be ignored since musl does not use it and the tls access models of shared libraries are not affected. The previous setup only worked if (tls_align & -GAP_ABOVE_TP) == 0 (i.e. TLS did not require large alignment) because the gap was treated as a fixed offset from TP. Now the TP points at the end of the pthread struct (which is aligned) and there is a gap above it (which may also need alignment). The fix required changing TP_ADJ and __pthread_self on affected targets (aarch64, arm and sh) and in the tlsdesc asm the offset to access the dtv changed too.	2018-06-02 19:38:44 -04:00
Andre McCurdy	749a06b4c5	arm: respect both __ARM_ARCH_6KZ__ and __ARM_ARCH_6ZK__ macros __ARM_ARCH_6ZK__ is a gcc specific historical typo which may not be defined by other compilers. https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02237.html To avoid unexpected results when building for ARMv6KZ with clang, the correct form of the macro (ie 6KZ) needs to be tested. The incorrect form of the macro (ie 6ZK) still needs to be tested for compatibility with pre-2015 versions of gcc.	2018-04-19 12:35:49 -04:00
Rich Felker	29237f7f5c	rework arm atomic/tp backends to be thumb-compatible and fdpic-ready three problems are addressed: - use of pc arithmetic, which was difficult if not impossible to make correct in thumb mode on all models, so that relative rather than absolute pointers to the backends could be used. this was designed back when there was no coherent model for the early stages of the dynamic linker before relocations, and is no longer necessary. - assumption that data (the relative pointers to the backends) can be accessed at a constant displacement from the code. this will not be possible on future fdpic subarchs (for cortex-m), so move responsibility for loading the backend code address to the caller. - hard-coded arm opcodes using the .word directive. instead, use the .arch directive to work around the assembler's refusal to assemble instructions not available (or in some cases, available but just considered deprecated) in the target isa level. the obscure v6t2 arch is used for v6 code so as to (1) allow generation of thumb2 output if -mthumb is active, and (2) avoid warnings/errors for mcr barriers that clang would produce if we just set arch to v7-a. in addition, the __aeabi_read_tp function is moved out of the inner workings and implemented as an asm wrapper around a C function, so that asm code does not need to read global data. the asm wrapper serves to satisfy the ABI calling convention requirements for this function.	2016-12-19 21:21:08 -05:00
Rich Felker	cb1bf2f321	properly access mcontext_t program counter in cancellation handler using the actual mcontext_t definition rather than an overlaid pointer array both improves correctness/readability and eliminates some ugly hacks for archs with 64-bit registers bit 32-bit program counter. also fix UB due to comparison of pointers not in a common array object.	2015-11-02 12:41:49 -05:00
Rich Felker	74483c5955	mark arm thread-pointer-loading inline asm as volatile this builds on commits `a603a75a72` and `0ba35d69c0` to ensure that a compiler cannot conclude that it's valid to reorder the asm to a point before the thread pointer is set up, or to treat the inline function as if it were declared with attribute((const)). other archs already use volatile asm for thread pointer loading.	2015-10-15 12:04:48 -04:00
Rich Felker	0ba35d69c0	remove attribute((const)) from arm __pthread_self inline function commit `a603a75a72` did this for the public pthread_self function but not the internal inline one.	2015-10-15 00:20:50 -04:00
Rich Felker	4a241f14a6	overhaul ARM atomics/tls for performance and compatibility previously, builds for pre-armv6 targets hard-coded use of the "kuser helper" system for atomics and thread-pointer access, resulting in binaries that fail to run (crash) on systems where this functionality has been disabled (as a security/hardening measure) in the kernel. additionally, builds for armv6 hard-coded an outdated/deprecated memory barrier instruction which may require emulation (extremely slow) on future models. this overhaul replaces the behavior for all pre-armv7 builds (both of the above cases) to perform runtime detection of the appropriate mechanisms for barrier, atomic compare-and-swap, and thread pointer access. detection is based on information provided by the kernel in auxv: presence of the HWCAP_TLS bit for AT_HWCAP and the architecture version encoded in AT_PLATFORM. direct use of the instructions is preferred when possible, since probing for the existence of the kuser helper page would be difficult and would incur runtime cost. for builds targeting armv7 or later, the runtime detection code is not compiled at all, and much more efficient versions of the non-cas atomic operations are provided by using ldrex/strex directly rather than wrapping cas.	2014-11-19 01:02:01 -05:00
Rich Felker	e783efa6ef	fix arm thread-pointer/atomic asm when compiling to thumb code armv7/thumb2 provides a way to do atomics in thumb mode, but for armv6 we need a call to arm mode. this commit is based on a patch by Stephen Thomas which fixed the armv7 cases but not the armv6 ones. all of this should be revisited if/when runtime selection of thread pointer access and atomics are added.	2014-04-30 15:32:11 -04:00
Rich Felker	1974bffa2d	use inline atomics and thread pointer on arm models supporting them this is perhaps not the optimal implementation; a_cas still compiles to nested loops due to the different interface contracts of the kuser helper cas function (whose contract this patch implements) and the a_cas function (whose contract mimics the x86 cmpxchg). fixing this may be possible, but it's more complicated and thus deferred until a later time. aside from improving performance and code size, this patch also provides a means of producing binaries which can run on hardened kernels where the kuser helpers have been disabled. however, at present this requires producing binaries for armv6k or later, which will not run on older cpus. a real solution to the problem of kernels that omit the kuser helpers would be runtime detection, so that universal binaries which run on all arm cpu models can also be compatible with all kernel hardening profiles. robust detection however is a much harder problem, and will be addressed at a later time.	2014-04-07 04:03:18 -04:00
Rich Felker	9ec4283b28	add support for TLS variant I, presently needed for arm and mips despite documentation that makes it sound a lot different, the only ABI-constraint difference between TLS variants II and I seems to be that variant II stores the initial TLS segment immediately below the thread pointer (i.e. the thread pointer points to the end of it) and variant I stores the initial TLS segment above the thread pointer, requiring the thread descriptor to be stored below. the actual value stored in the thread pointer register also tends to have per-arch random offsets applied to it for silly micro-optimization purposes. with these changes applied, TLS should be basically working on all supported archs except microblaze. I'm still working on getting the necessary information and a working toolchain that can build TLS binaries for microblaze, but in theory, static-linked programs with TLS and dynamic-linked programs where only the main executable uses TLS should already work on microblaze. alignment constraints have not yet been heavily tested, so it's possible that this code does not always align TLS segments correctly on archs that need TLS variant I.	2012-10-15 18:51:53 -04:00
Rich Felker	834255a3ff	use __attribute__((const)) on arm __pthread_self function	2012-02-25 02:52:18 -05:00
Rich Felker	d5bde7babb	"optimize" arm __pthread_self actually this is just to avoid gcc being stupid and refusing to inline the function version, even when the size cost is essentially identical whether it's inlined or not.	2011-09-22 22:56:06 -04:00
Rich Felker	d960d4f2cb	initial commit of the arm port this port assumes eabi calling conventions, eabi linux syscall convention, and presence of the kernel helpers at 0xffff0f?0 needed for threads support. otherwise it makes very few assumptions, and the code should work even on armv4 without thumb support, as well as on systems with thumb interworking. the bits headers declare this a little endian system, but as far as i can tell the code should work equally well on big endian. some small details are probably broken; so far, testing has been limited to qemu/aboriginal linux.	2011-09-18 16:44:54 -04:00

16 Commits