fix excessively slow TLS performance on some mips models

commit 6d99ad91e8 introduced this
regression as part of a larger change, based on an incorrect
assumption that rdhwr being part of the mips r2 ISA level meant that
the TLS register, known in the mips documentation as UserLocal, was
unconditionally present on chips providing this ISA level and would
not need trap-and-emulate. this turns out to be false.

based on research by Stanislav Kljuhhin and Abilio Marques, who
reported the problem as a performance regression on certain routers
using OpenWRT vs older uclibc-based versions, it turns out the mips
manuals document the UserLocal register as a feature that might or
might not be implemented or enabled, reflected by a cpu capability bit
in the CONFIG3 register, and that Linux checks for this and has to
explicitly enable it on models that have it.

thus, it's indeed possible that r2+ chips can lack the feature,
bringing us back to the situation where Linux only has a fast
trap-and-emulate path for the case where the destination register is
$3. so, always read the thread pointer through $3. this may incur a
gratuitous move to the desired final register on chips where it's not
needed, but it really doesn't matter.
This commit is contained in:
Rich Felker 2021-08-12 18:07:44 -04:00
parent 3eed6a6f0a
commit b713b8b2e4
1 changed files with 1 additions and 2 deletions

View File

@ -1,10 +1,9 @@
static inline uintptr_t __get_tp()
{
#if __mips_isa_rev < 2
register uintptr_t tp __asm__("$3");
#if __mips_isa_rev < 2
__asm__ (".word 0x7c03e83b" : "=r" (tp) );
#else
uintptr_t tp;
__asm__ ("rdhwr %0, $29" : "=r" (tp) );
#endif
return tp;