In TLS variant I the TLS is above TP (or above a fixed offset from TP)
but on some targets there is a reserved gap above TP before TLS starts.
This matters for the local-exec tls access model when the offsets of
TLS variables from the TP are hard coded by the linker into the
executable, so the libc must compute these offsets the same way as the
linker. The tls offset of the main module has to be
alignup(GAP_ABOVE_TP, main_tls_align).
If there is no TLS in the main module then the gap can be ignored
since musl does not use it and the tls access models of shared
libraries are not affected.
The previous setup only worked if (tls_align & -GAP_ABOVE_TP) == 0
(i.e. TLS did not require large alignment) because the gap was
treated as a fixed offset from TP. Now the TP points at the end
of the pthread struct (which is aligned) and there is a gap above
it (which may also need alignment).
The fix required changing TP_ADJ and __pthread_self on affected
targets (aarch64, arm and sh) and in the tlsdesc asm the offset to
access the dtv changed too.
commit 618b18c78e removed the previous
detection and hardening since it was incorrect. commit
72141795d4 already handled all that
remained for hardening the static-linked case. in the dynamic-linked
case, have the dynamic linker check whether malloc was replaced and
make that information available.
with these changes, the properties documented in commit
c9f415d7ea are restored: if calloc is
not provided, it will behave as malloc+memset, and any of the
memalign-family functions not provided will fail with ENOMEM.
the existing laddr function for fdpic cannot translate ELF virtual
addresses outside of the LOAD segments to runtime addresses because
the fdpic loadmap only covers the logically-mapped part. however the
whole point of reclaim_gaps is to recover the slack space up to the
page boundaries, so it needs to work with such addresses.
add a new laddr_pg function that accepts any address in the page range
for the LOAD segment by expanding the loadmap records out to page
boundaries. only use the new version for reclaim_gaps, so as not to
impact performance of other address lookups.
also, only use laddr_pg for the start address of a gap; the end
address lies one byte beyond the end, potentially in a different page
where it would get mapped differently. instead of mapping end, apply
the length (end-start) to the mapped value of start.
in theory non-absolute origins can only arise when either the main
program is invoked by running ldso as a command (inherently non-suid)
or when dlopen was called with a relative pathname containing at least
one slash. such usage would be inherently insecure in an suid program
anyway, so the old behavior here does not seem to have been insecure.
harden against it anyway.
the rpath fixup code assumed any module's name field would contain at
least one slash, an invariant which is usually met but not in the case
of a main executable loaded from the current working directory by
running ldd or ldso as a command. it would be possible to make this
invariant always hold, but it has a higher runtime allocation cost and
does not seem useful elsewhere, so just patch things up in fixup_rpath
instead.
the Linux and FreeBSD man pages for dladdr document dli_fbase as the
"base address" of the library/module found. normally (e.g. AT_BASE)
the term "base" is used to denote the base address relative to which
p_vaddr addresses are interpreted; however in the case of dladdr's
Dl_info structure, existing implementations define it as the lowest
address of the mapping, which makes sense in the context of
determining which module's memory range the input address falls
within.
since this is a nonstandard interface provided to mimic one provided
by other implementations, adjust it to match their behavior.
commit c49d3c8ada added logic to detect
attempts to load libc.so via another name and instead redirect to the
existing libc, rather than loading two and producing dangerously
inconsistent state. however, the check for and unmapping of the
duplicate libc happened after reclaim_gaps was already called,
donating the slack space around the writable segment to malloc.
subsequent unmapping of the library then invalidated malloc's free
lists.
fix the issue by moving the call to reclaim_gaps out of map_library
into load_library, after the duplicate libc check but before the first
call to calloc, so that the gaps can still be used to satisfy the
allocation of struct dso. this change also eliminates the need for an
ugly hack (temporarily setting runtime=1) to avoid reclaim_gaps when
loading the main program via map_library, which happens when ldso is
invoked as a command.
only programs/libraries erroneously containing a DT_NEEDED reference
to libc.so via an absolute pathname or symlink were affected by this
issue.
previously, the pathname used to load the program was always used as
argv[0]. the default remains the same, but a new --argv0 option can be
used to provide a different value.
a null pointer for a library's deps list was ambiguous: it could
indicate either no dependencies or that the dependency list had not
yet been populated. inability to distinguish could lead to spurious
work when dlopen is called multiple times on a library with no deps,
and due to related bugs, could actually cause other libraries to
falsely appear as dependencies, translating into false positives for
dlsym.
avoid the problem by always initializing the deps pointer, pointing to
an empty list if there are no deps. rather than wasting memory and
introducing another failure path by allocating an empty list per
library, simply share a global dummy list.
further fixes will be needed for related bugs, and much of this code
may end up being replaced.
commit 4ff234f6cb erroneously changed
the condition for running certain code at dlopen time to check whether
the library was already relocated rather than whether it already had
its deps[] table filled. this was out of concern over whether the code
under the conditional would be idempotent/safe to call on an
already-loaded libraries. however, I missed a consideration in the
opposite direction: if a library was loaded at program startup rather
than dlopen, its deps[] table was not yet allocated/filled, and
load_deps needs to be called at dlopen time in order for dlsym to be
able to perform dependency-order symbol lookups.
in order to avoid wasteful allocation of lazy-binding relocation
tables for libraries which were already loaded and relocated at
startup, the check for !p->relocated is not deleted entirely, but
moved to apply only to allocation of these dables.
this change was suggested based on testing done by Timo Teräs almost
two years ago; the branch (and probably call prep overhead) in the
inner loop was found to contribute noticably to total symbol lookup
time. this change will make lookup slightly slower if libraries were
built with only the traditional "sysv" ELF hash table, but based on
how much slower lookup tends to be without the gnu hash table, it
seems reasonable to assume that (1) users building without gnu hash
don't care about dynamic linking performance, and (2) the extra time
spent computing the gnu hash is likely to be dominated by the slowness
of the sysv hash table lookup anyway.
such loading is unsafe, and can happen when programs use their own
logic to locate a .so file then pass the absolute pathname to dlopen,
or if an absolute pathname ends up in DT_NEEDED headers. multiple
loads with only the base name were already precluded, provided libc
was named appropriately, by special-casing standard library names.
one function symbol (in the reserved namespace, but public, since it's
part of the crt1 entry point ABI) and one data symbol are checked.
this way we avoid likely false positives, particularly from libraries
interposing and wrapping functions. there is no hard requirement to
avoid breaking such usage, since trying to run a hook before libc is
even initialized is not a supported usage case, but it's friendlier
not to break things.
traditional lazy relocation with call-time plt resolver is
intentionally not implemented, as it is a huge bug surface and demands
significant amounts of arch-specific code and requires ongoing
maintenance to ensure compatibility with applications which make use
of new additions to the arch's register file in passing function
arguments.
some applications, however, depend on the ability to dlopen modules
which have unsatisfied symbol references at the time they are loaded,
either avoiding use of the affected interfaces or manually loading
another module to provide the missing definition via their own module
dependency tracking outside the ELF data structures. while such usage
is non-conforming, failure to support it has been a significant
obstacle for users/distributions trying to support affected software,
particularly the X.org server.
instead of resolving lazy relocations at call time, this patch saves
unresolved GOT/PLT relocations for deferral and retries them after
each subsequent dlopen until they are resolved. since dlopen is the
only time at which the effective global symbol table can change, this
behavior is not observably different from traditional lazy binding,
and the required code is minimal.
when loading libraries with dlopen, the caller can request that the
library's symbols become part of the global symbol table, or that they
only be used for resolving relocations in the loaded library and its
dependencies. in the latter case, a subsequent dlopen of the same
library can upgrade it to global status.
previously, if a library was upgraded from local to global mode, its
symbols entered the symbol lookup search order at the point where the
library was originally loaded. this means that a new call to dlopen
could change the value of a symbol that already had a visible
definition, an inconsistency which applications could observe.
POSIX is unclear whether this should happen or whether it's permitted
to happen, but the resolution of Austin Group issue #982 made it
formally unspecified.
with this patch, a library whose mode is upgraded from local to global
enters the symbol lookup order at the point where it was made global,
so that symbol resolution before and after the upgrade are consistent.
in order to implement this change, the per-dso global flag is replaced
with a separate set of linked-list pointers for participation in the
global symbol table. this permits the order of dso objects for symbol
resolution to differ from the order used for iteration of all loaded
libraries. it also improves performance of find_sym, by avoiding a
branch per iteration and skipping, and especially in the case where
many non-global libraries have been loaded, by allowing the loop to
skip over them entirely. logic for temporarily adding non-global
libraries to the symbol table for relocation purposes is also mildly
simplified.
A weak symbol definition is not special during dynamic linking, so
don't let a strong definition in a later module override it.
(glibc dynamic linker allows overriding weak definitions if
LD_DYNAMIC_WEAK is set, musl does not.)
STB_GNU_UNIQUE means that the symbol is global, even if it is in a
module that's loaded with RTLD_LOCAL, and all references resolve to
the same definition. This semantics is only relevant for c++ plugin
systems and even there it's often not what the user wants (so it can
be turned off in g++ by -fno-gnu-unique when the c++ shared lib is
compiled). In musl just treat it like STB_GLOBAL.
commit d56460c939 introduced this
regression as part of splitting the tls module list out of the dso
list. the new code added to dlopen's failure path to undo the changes
adding the partially-loaded libraries reset the tls_tail pointer
correctly, but did not clear its link to the next list entry. thus, at
least until the next successful dlopen, the list was not terminated
but ended with an invalid next pointer, which __copy_tls attempted to
follow when a new thread was created.
patch by Mikael Vidstedt.
On s390x, the kernel provides AT_SYSINFO_EHDR, but sets it to zero, if the
program being run does not have a program interpreter. This causes
problems when running the dynamic linker directly.
alpha and s390x gratuitously use 64-bit entries (wasting 2x space and
cache utilization) despite the values always being 32-bit.
based on patch by Bobby Bingham, with changes suggested by Alexander
Monakov to use the public Elf_Symndx type from link.h (and make it
properly variable by arch) rather than adding new internal
infrastructure for handling the type.
If a DT_NEEDED entry was the prefix of a reserved library name
(up to the first dot) then it was incorrectly treated as a libc
reserved name.
e.g. libp.so dependency was not loaded as it matched libpthread
reserved name.
this change is made in preparation for adding the mips64 port, which
needs a 64-bit (and mips64-specific) form of the R_INFO macro, but
it's a better abstraction anyway.
based on part of the mips64 port patch by Mahesh Bodapati and Jaydeep
Patil of Imagination Technologies.
commit d56460c939 introduced this bug by
setting up the tls module chain incorrectly when the main app has tls.
the singly-linked list head pointer was setup correctly, but the tail
pointer was not, so the first attempt to append to the list (for a
shared library with tls) would treat the list as empty and effectively
removed the main app from the list. this left all tls module id
numbers off-by-one.
this bug did not appear in any released versions.
this eliminates the last need for the SHARED macro to control how
files in the src tree are compiled. the same code is used for both
libc.a and libc.so, with additional code for the dynamic linker (from
the new ldso tree) being added to libc.so but not libc.a. separate .o
and .lo object files still exist for the src tree, but the only
difference is that the .lo files are built as PIC.
in the future, if/when we add dlopen support for static-linked
programs, much of the code in dynlink.c may be moved back into the src
tree, but properly factored into separate source files. in that case,
the code in the ldso tree will be reduced to just the dynamic linker
entry point, self-relocation, and loading of libraries needed by the
main application.