commit 37bb3cce45 suppressed the
declaration for C++, where it is wrongly interpreted as declaring the
function as taking no arguments. with C23 removing non-prototype
declarations, that problem is now also relevant to C.
the non-prototype declaration for basename originates with commit
06aec8d715, where it was designed to
avoid conflicts with programs which declare basename with the GNU
signature taking const char *. that change was probably misguided, as
it represents not only misaligned expectations with the caller, but
also undefined behavior (calling a function that's been declared with
the wrong type).
we could opt to fix the declaration, but since glibc, with the
gratuitously incompatible GNU-basename function, seems to be the only
implementation that declares it in string.h, it seems better to just
remove the declaration. this provides some warning if applications are
being built expecting the GNU behavior but not getting it. if we
declared it here, it would only produce a warning if the caller also
declares it themselves (rare) or if the caller attempts to pass a
const-qualified pointer.
memmem has been adopted for the next issue of POSIX (outcome of
tracker item 1061). since mem* is in the reserved namespace for
string.h it's already fully conforming to expose it by default, so
just do so.
maintainer's note: past sentiment was that, despite being imperfect
and unable to force clearing of all possible copies of sensitive data
(e.g. in registers, register spills, signal contexts left on the
stack, etc.) this function would be added if major implementations
agreed on it, which has happened -- several BSDs and glibc all include
it.
unfortunately this eliminates the ability of the compiler to diagnose
some dangerous/incorrect usage, but POSIX requires (as an extension to
the C language, i.e. CX shaded) that NULL have type void *. plain C
allows it to be defined as any null pointer constant.
the definition 0L is preserved for C++ rather than reverting to plain
0 to avoid dangerous behavior in non-conforming programs which use
NULL as a variadic sentinel. (it's impossible to use (void *)0 for C++
since C++ lacks the proper implicit pointer conversions, and other
popular alternatives like the GCC __null extension seem non-conforming
to the standard's requirements.)
the historical mess of having different definitions for C and C++
comes from the historical C definition as (void *)0 and the fact that
(void *)0 can't be used in C++ because it does not convert to other
pointer types implicitly. however, using plain 0 in C++ exposed bugs
in C++ programs that call variadic functions with NULL as an argument
and (wrongly; this is UB) expect it to arrive as a null pointer. on
64-bit machines, the high bits end up containing junk. glibc dodges
the issue by using a GCC extension __null to define NULL; this is
observably non-conforming because a conforming application could
observe the definition of NULL via stringizing and see that it is
neither an integer constant expression with value zero nor such an
expression cast to void.
switching to 0L eliminates the issue and provides compatibility with
broken applications, since on all musl targets, long and pointers have
the same size, representation, and argument-passing convention. we
could maintain separate C and C++ definitions of NULL (i.e. just use
0L on C++ and use (void *)0 on C) but after careful analysis, it seems
extremely difficult for a C program to even determine whether NULL has
integer or pointer type, much less depend in subtle, unintentional
ways, on whether it does. C89 seems to have no way to make the
distinction. on C99, the fact that (int)(void *)0 is not an integer
constant expression, along with subtle VLA/sizeof semantics, can be
used to make the distinction, but many compilers are non-conforming
and give the wrong result to this test anyway. on C11, _Generic can
trivially make the distinction, but it seems unlikely that code
targetting C11 would be so backwards in caring which definition of
NULL an implementation uses.
as such, the simplest path of using the same definition for NULL in
both C and C++ was chosen. the #undef directive was also removed so
that the compiler can catch and give a warning or error on
redefinition if buggy programs have defined their own versions of
NULL prior to inclusion of standard headers.
previously, a few BSD features were enabled only by _BSD_SOURCE, not
by _GNU_SOURCE. since _BSD_SOURCE is default in the absence of other
feature test macros, this made adding _GNU_SOURCE to a project not a
purely additive feature test macro; it actually caused some features
to be suppressed.
most of the changes made by this patch actually bring musl in closer
alignment with the glibc behavior for _GNU_SOURCE. the only exceptions
are the added visibility of functions like strlcpy which were BSD-only
due to being disliked/rejected by glibc maintainers. here, I feel the
consistency of having _GNU_SOURCE mean "everything", and especially
the property of it being purely additive, are more valuable than
hiding functions which glibc does not have.
the old behavior of exposing nothing except plain ISO C can be
obtained by defining __STRICT_ANSI__ or using a compiler option (such
as -std=c99) that predefines it. the new default featureset is POSIX
with XSI plus _BSD_SOURCE. any explicit feature test macros will
inhibit the default.
installation docs have also been updated to reflect this change.
to deal with the fact that the public headers may be used with pre-c99
compilers, __restrict is used in place of restrict, and defined
appropriately for any supported compiler. we also avoid the form
[restrict] since older versions of gcc rejected it due to a bug in the
original c99 standard, and instead use the form *restrict.
the non-prototype declaration of basename in string.h is an ugly
compromise to avoid breaking 2 types of broken software:
1. programs which assume basename is declared in string.h and thus
would suffer from dangerous pointer-truncation if an implicit
declaration were used.
2. programs which include string.h with _GNU_SOURCE defined but then
declare their own prototype for basename using the incorrect GNU
signature for the function (which would clash with a correct
prototype).
however, since C++ does not have non-prototype declarations and
interprets them as prototypes for a function with no arguments, we
must omit it when compiling C++ code. thankfully, all known broken
apps that suffer from the above issues are written in C, not C++.
GNU programs may expect the GNU version of basename, which has a
different prototype (argument is const-qualified) and prototype it
themselves too. of course if they're expecting the GNU behavior for
the function, they'll still run into problems, but at least this
eliminates some compile-time failures.
note that it still will have the standards-conformant behavior, not
the GNU behavior. but at least this prevents broken code from ending
up with truncated pointers due to implicit declarations...
programs that use this tend to horribly botch international text
support, so it's questionable whether we want to support it even in
the long term... for now, it's just a dummy that calls strcmp.