failure to do so was causing crashes on x86_64 when ctors used SSE,
which was first observed when ctors called variadic functions due to
the SSE prologue code inserted into every variadic function.
this is mainly in hopes of supporting c++ (not yet possible for other
reasons) but will also help applications/libraries which use (and more
often, abuse) the gcc __attribute__((__constructor__)) feature in "C"
code.
x86_64 and arm versions of the new startup asm are untested and may
have minor problems.