This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: TLSDESC clobber ABI stability/futureproofness?
On Thu, Oct 11, 2018 at 12:53:04AM -0300, Alexandre Oliva wrote:
> On Oct 10, 2018, Rich Felker <dalias@libc.org> wrote:
>
> > It's recently come up in musl libc development that the tlsdesc asm
> > functions, at least for some archs, are potentially not future-proof,
> > in that, for a given fixed version of the asm in the dynamic linker,
> > it seems possible for a future ISA level and compiler supporting that
> > ISA level to produce code, in the C functions called in the dynamic
> > fallback case, instructions which clobber registers which are normally
> > call-clobbered, but which are non-clobbered in the tlsdesc ABI. This
> > does not risk breakage when an existing valid build of libc/ldso is
> > used on new hardware and new appliations that provide new registers,
> > but it does risk breakage if an existing source version of libc/ldso
> > is built with a compiler supporting new extensions, which is difficult
> > to preclude and not something we want to try to preclude.
>
> I understand the concern. I considered it back when I designed TLSDesc,
> and my reasoning was that if the implementation of the fallback dynamic
> TLS descriptor allocator could possibly use some register, the library
> implementation should know about it as well, and work to preserve it. I
> realize this might not be the case for an old library built by a new
> compiler for a newer version of the library's target. Other pieces of
> the library may fail as well, if registers unknown to it are available
> and used by the compiler (setjmp/longjmp, *context, dynamic PLT
> resolution come to mind), besides the usual difficulties building old
> code with newer tools, so I figured it wasn't worth sacrificing the
> performance of the normal TLSDesc case to make this aspect of register
> set extensions easier.
>
> There might be another particularly risky case, namely, that the memory
> allocator used by TLS descriptors be overridden by code that uses more
> registers than the library knows to preserve. Memory allocation within
> the dynamic loader, including lazy TLS Descriptor relocation resolution,
> is a context in which we should probably use internal, non-overridable
> memory allocators, if we don't already. This would reduce the present
> risky case to the one in the paragraph above.
This is indeed the big risk for glibc right now (with lazy,
non-fail-safe allocation of dynamic TLS), but not for musl, where the
only code that runs in the fallback path uses signal masking, atomics,
and memcpy/memset to acquire and install the already-allocated TLS. We
have precise control of what code is executing at the source level,
but without strong assumptions about the compiler and the ability to
restrict what parts of the ISA it uses, which we don't want to make,
we only have partial control at the binary level.
I had considered just pregenerating (via gcc -S at minimum ISA level)
and committing flattened asm for this code path, but that does not
work well for ARM where we have to support runtime-switchable
implementations of the atomic primitives, and the implementation is
provided by the kernel in some cases, thereby outside our control.
> > For aarch64 at least, according to discussions I had with Szabolcs
> > Nagy, there is an intent that any new extensions to the aarch64
> > register file be treated as clobbered by tlsdesc functions, rather
> > than preserved.
>
> That's unfortunate. I'm not sure I understand the reasoning behind this
> intent. Maybe we should discuss it further?
Aside from what Szabolcs Nagy already wrote in the reply, the
explanation I'd heard for why it's not a significant burdern/cost is
that it's unlikely for vector-heavy code to be using TLS where the TLS
address load can't be hoisted out of the blocks where the
call-clobbered vector regs are in use. Generally, if such hoisting is
performed, the main/only advantage of avoiding clobbers is for
registers which may contain incoming arguments.
> > In the x86 spec, the closest I can find are the phrasing:
>
> > "being able to assume no registers are clobbered by the call"
>
> > and the comment in the pseudo-C:
>
> > /* Preserve any call-clobbered registers not preserved because of
> > the above across the call below. */
>
> > Source: https://www.fsfla.org/~lxoliva/writeups/TLS/RFC-TLSDESC-x86.txt
>
> > What is the policy for i386 and x86_64?
>
> I don't know that my proposal ever became authoritative policy, but even
> if it is, I guess I have to agree it is underspecified and the reasoning
> above could be added.
>
> > Are normally call-clobbered registers from new register file
> > extensions intended to be preserved by the tlsdesc functions, or
> > clobberable by them?
>
> My thinking has always been that they should be preserved, which doesn't
> necessarily mean they have to be saved and restored. Only if the
> implementation of tlsdesc could possibly modify them should it arrange
> for their entry-point values to be restored before returning. This
> implies not calling overridable functions in the internal
> implementation, and compiling at least the bits used by the tlsdesc
> implementation so as to use only the register set known and supported by
> the library.
>
> Anyway, thanks for bringing this up. I'm amending the x86 TLSDesc
> proposal to cover this with the following footnote:
>
> (*) Preserving a register does not necessarily imply saving and
> restoring it. If the system library implementation does not use or
> even know about a certain extended register set, it needs not save it,
> because it will presumably not modify it. This assumes the TLS
> Descriptor implementation is self-contained within the system library,
> without no overridable callbacks. A consequence is that, even if
> other parts of the system library are compiled so as to use an
> extended register set, those used by the implementation of TLS
> Descriptors, including lazy relocations, should be limited to using
> the register set that the interfaces are known to preserve.
>
> after:
>
> [...] This penalizes
> the case that requires dynamic TLS, since it must preserve (*) all
> call-clobbered registers [...]
>
> Please let me know your thoughts about this change, e.g., whether it's
> enough to address your concerns or if you envision a need for more than
> that. Thanks,
It does address my concerns, just not the way I'd hoped. I think it
confirms that we need to use either a signal handler to install TLS
(so the fallback path is just 'raise' in asm) or abandon late
installation of dynamic TLS (instead doing it synchronously at dlopen
time), unless there is some future-proof approach to
save-all/restore-all that works on all archs with TLSDESC -- for
example, on x86 using fsave/fxsave on pre-xsave ISA levels and xsave
as the future-proof fallback probably works.
Rich