I have hit an issue with thread-local storage variables on Cygwin/AMD64, I do not see it with Cygwin/i686. I am having linking issues when using `thread_local` keyword in Cygwin with its GCC 4.8.3 and GCC 4.9.2. This is derived from log4cplus. The test case is split into three files: File def.hxx: ~~~~ #include <string> namespace N { struct S { std::string str; }; // extern declaration in a header extern thread_local S * ptd; // accessing the extern declared ptd here inline S * get_ptd () { if (! ptd) ptd = new S; return ptd; } } // namespace N ~~~~ File def.cxx: ~~~~ #include "def.hxx" namespace N { // definition of ptd thread_local S * ptd = nullptr; } // namespace N ~~~ File use.cxx: ~~~~ #include "def.hxx" namespace N { __declspec(dllexport) void * foo () { // invoking inline get_ptd() function to get the value in ptd return get_ptd (); } } ~~~~ Now, when I compile each .cxx with `g++ -std=gnu++11 -fvisibility=hidden -c use.cxx def.cxx` and then try to link with `g++ -shared -o cygtest.dll use.o def.o`, I get the following error from linker: ~~~~ use.o:use.cxx:(.text$_ZTWN1N3ptdE[_ZTWN1N3ptdE]+0x15): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `TLS init function for N::ptd' collect2: error: ld returned 1 exit status ~~~~ The nm -C ./def.o output confirms that: ~~~~ `--> nm -C ./def.o 0000000000000000 b .bss 0000000000000000 d .data 0000000000000000 r .rdata 0000000000000000 r .rdata$zzz 0000000000000000 t .text 0000000000000008 r __emutls_t._ZN1N3ptdE 0000000000000000 D __emutls_v._ZN1N3ptdE 0000000000000000 r std::piecewise_construct ~~~~ As you can see, the ptd thread-local variable initialization function is not defined anywhere. The use.o references this initialization function (see bottom of the listing): ~~~~ `--> nm -C ./use.o 0000000000000000 b .bss 0000000000000000 d .data 0000000000000000 i .drectve 0000000000000000 p .pdata 0000000000000000 p .pdata$_ZN1N1SC1Ev 0000000000000000 p .pdata$_ZN1N7get_ptdEv 0000000000000000 p .pdata$_ZTWN1N3ptdE 0000000000000000 r .rdata 0000000000000000 r .rdata$.refptr.__emutls_v._ZN1N3ptdE 0000000000000000 r .rdata$.refptr._ZTHN1N3ptdE 0000000000000000 r .rdata$zzz 0000000000000000 R .refptr.__emutls_v._ZN1N3ptdE 0000000000000000 R .refptr._ZTHN1N3ptdE 0000000000000000 t .text 0000000000000000 t .text$_ZN1N1SC1Ev 0000000000000000 t .text$_ZN1N7get_ptdEv 0000000000000000 t .text$_ZTWN1N3ptdE 0000000000000000 A .weak._ZTHN1N3ptdE._ZN1N1SC1Ev 0000000000000000 r .xdata 0000000000000000 r .xdata$_ZN1N1SC1Ev 0000000000000000 r .xdata$_ZN1N7get_ptdEv 0000000000000000 r .xdata$_ZTWN1N3ptdE U __emutls_get_address U __emutls_v._ZN1N3ptdE U __gxx_personality_seh0 U __real__ZdlPv U __real__Znwm U _Unwind_Resume U operator delete(void*) 0000000000000000 T N::S::S() 0000000000000000 T N::foo() 0000000000000000 T N::get_ptd() U std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string() U operator new(unsigned long) 0000000000000000 r std::piecewise_construct w TLS init function for N::ptd 0000000000000000 T TLS wrapper function for N::ptd ~~~~ Now, this code seems to work well on Linux with both GCC and Clang. Is this a GCC problem on Cygwin? Am I using extern thread_local wrong? My experiments show that not using the extern keyword seems to fix the issue. But I am not sure if that does not introduce two ptd thread-local variables in two TUs. See also http://stackoverflow.com/q/28023728/341065
Created attachment 34503 [details] def.hxx
Created attachment 34504 [details] def.cxx
Created attachment 34505 [details] use.cxx
Problem is still present in gcc 5.3.0. Furthermore, it also appears when the thread_local variable is a static class member.
(In reply to Václav Zeman from comment #0) > use.o:use.cxx:(.text$_ZTWN1N3ptdE[_ZTWN1N3ptdE]+0x15): relocation > truncated to fit: R_X86_64_PC32 against undefined symbol `TLS init > function for N::ptd' > collect2: error: ld returned 1 exit status Cygwin doesn't use R_X86_64_PC32. Please show us the output of # g++ -v -shared -o cygtest.dll use.o def.o and # ld -V
Created attachment 37630 [details] logs requested by #5 comment Here is the linking -v output and ld -V output.
Created attachment 37631 [details] logs after complete recompilation logs after complete recompilation
Your compiler doesn't have proper LTO support. Please turn it off with -fno-lto.
(In reply to H.J. Lu from comment #8) > Your compiler doesn't have proper LTO support. Please turn it > off with -fno-lto. How/why is it improper?
(In reply to Václav Zeman from comment #9) > (In reply to H.J. Lu from comment #8) > > Your compiler doesn't have proper LTO support. Please turn it > > off with -fno-lto. > > How/why is it improper? Your LTO generates binary files targeting Linux from LTO IR.
Created attachment 37638 [details] logs of compilation with -fno-lto (In reply to H.J. Lu from comment #8) > Your compiler doesn't have proper LTO support. Please turn it > off with -fno-lto.
Please provide the output of "objdump -r use.o".
Created attachment 37643 [details] objdump -r use.o log (In reply to H.J. Lu from comment #12) > Please provide the output of "objdump -r use.o".
Created attachment 37644 [details] objdump -Ttr def.o log `objdump -Ttr def.o` in advance, just in case it is relevant.
I didn't realize Windows linker uses ELF relocation names. I don't know what is wrong.
And this still fails for me with GCC 5.3: `--> ./build.sh + g++ -std=gnu++11 -fvisibility=hidden -c use.cxx def.cxx + g++ -shared -o cygtest.dll use.o def.o use.o:use.cxx:(.text$_ZTWN1N3ptdE[_ZTWN1N3ptdE]+0x15): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `TLS init function for N::ptd' collect2: error: ld returned 1 exit status .-(~/log4cplus-git/tls-test-case) `--> g++ -v Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-cygwin/5.3.0/lto-wrapper.exe Target: x86_64-pc-cygwin Configured with: /cygdrive/i/szsz/tmpp/gcc/gcc-5.3.0-3.x86_64/src/gcc-5.3.0/configure --srcdir=/cygdrive/i/szsz/tmpp/gcc/gcc-5.3.0-3.x86_64/src/gcc-5.3.0 --prefix=/usr --exec-prefix=/usr --localstatedir=/var --sysconfdir=/etc --docdir=/usr/share/doc/gcc --htmldir=/usr/share/doc/gcc/html -C --build=x86_64-pc-cygwin --host=x86_64-pc-cygwin --target=x86_64-pc-cygwin --without-libiconv-prefix --without-libintl-prefix --libexecdir=/usr/lib --enable-shared --enable-shared-libgcc --enable-static --enable-version-specific-runtime-libs --enable-bootstrap --enable-__cxa_atexit --with-dwarf2 --with-tune=generic --enable-languages=ada,c,c++,fortran,lto,objc,obj-c++ --enable-graphite --enable-threads=posix --enable-libatomic --enable-libcilkrts --enable-libgomp --enable-libitm --enable-libquadmath --enable-libquadmath-support --enable-libssp --enable-libada --enable-libgcj-sublibs --disable-java-awt --disable-symvers --with-ecj-jar=/usr/share/java/ecj.jar --with-gnu-ld --with-gnu-as --with-cloog-include=/usr/include/cloog-isl --without-libiconv-prefix --without-libintl-prefix --with-system-zlib --enable-linker-build-id --with-default-libstdcxx-abi=gcc4-compatible Thread model: posix gcc version 5.3.0 (GCC)
This is still an issue in 2017 with GCC 5.4.0.
And I have just verified it is still the same with GCC 6.3.0.
There appears to be some sort of interaction with the `inline` attribute of the `get_ptd()` function. If the `get_ptd()` function is just declared `extern` in `def.hxx` and defined in `def.cxx`, the link error goes away.
Still an issue in 2018 with GCC 7.3.0.
I looked into this a bit, as gdb 9.0 now uses thread_local in a way which trips over this. I came up with a slightly simpler reproduction: $ cat def.h extern thread_local int tlv; $ cat def.cc #include "def.h" thread_local int tlv; $ cat use.cc #include "def.h" int main() { tlv = 1; } $ x86_64-pc-cygwin-gcc def.cc use.cc --save-temps /tmp/ccMAKHhL.o:use.cc:(.text$_ZTW3tlv[_ZTW3tlv]+0x15): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `TLS init function for tlv' collect2: error: ld returned 1 exit status This compiles without error with x86_64-w64-mingw32-gcc. Looking at use.s: .file "use.cc" .text .section .text$_ZTW3tlv,"x" .linkonce discard .globl _ZTW3tlv .def _ZTW3tlv; .scl 2; .type 32; .endef .seh_proc _ZTW3tlv _ZTW3tlv: .LFB1: pushq %rbp .seh_pushreg %rbp movq %rsp, %rbp .seh_setframe %rbp, 0 subq $32, %rsp .seh_stackalloc 32 .seh_endprologue movq .refptr._ZTH3tlv(%rip), %rax testq %rax, %rax je .L2 call _ZTH3tlv .L2: movq .refptr.__emutls_v.tlv(%rip), %rcx call __emutls_get_address addq $32, %rsp popq %rbp ret .seh_endproc .def __main; .scl 2; .type 32; .endef .text .globl main .def main; .scl 2; .type 32; .endef .seh_proc main main: .LFB0: pushq %rbp .seh_pushreg %rbp movq %rsp, %rbp .seh_setframe %rbp, 0 subq $32, %rsp .seh_stackalloc 32 .seh_endprologue call __main call _ZTW3tlv movl $1, (%rax) movl $0, %eax addq $32, %rsp popq %rbp ret .seh_endproc .weak _ZTH3tlv .ident "GCC: (GNU) 7.4.0" .def _ZTH3tlv; .scl 2; .type 32; .endef .def __emutls_get_address; .scl 2; .type 32; .endef .section .rdata$.refptr.__emutls_v.tlv, "dr" .globl .refptr.__emutls_v.tlv .linkonce discard .refptr.__emutls_v.tlv: .quad __emutls_v.tlv .section .rdata$.refptr._ZTH3tlv, "dr" .globl .refptr._ZTH3tlv .linkonce discard .refptr._ZTH3tlv: .quad _ZTH3tlv The problem seems to be in the TLS wrapper function (_ZTW3tlv): [...] movq .refptr._ZTH3tlv(%rip), %rax testq %rax, %rax je .L2 call _ZTH3tlv [...] .weak _ZTH3tlv [...] .refptr._ZTH3tlv: .quad _ZTH3tlv The call here is to absolute address 0 (since the weak symbol has no other defintion), which is encoded relative to %rip. This requires a relocation, and the relative offset can't be contained in 32 signed bits, if the ImageBase is >2GB. As some confirmation of this analysis, this problem can be shown with x86_64-w64-mingw32-gcc, if the ImageBase is altered from 0x40 0000 (the default for that) to 0x1 0040 00000 (the default for x86_64 Cygwin) $ x86_64-w64-mingw32-gcc def.cc use.cc -Wl,--image-base,0x100400000 /tmp/cc3XRN6L.o:use.cc:(.text$_ZTW3tlv[_ZTW3tlv]+0x15): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `TLS init function for tlv' collect2: error: ld returned 1 exit status Naively, I think this could be fixed by generating code which indirects the call through the pseudo-reloc, but I'm not sure that makes sense.
This looks like a binutils bug to me. A call to an undefined weak function should never be executed, so it is OK for the linker to convert that call instruction into anything convenient. There is no need for a relocation that can reach an address of zero. We can convert the call instruction to call itself, or the next instruction, or change it to a nop, what ever is convenient, it doesn't really matter. A number of binutils ports already have code to handle related problems. ARM and RISC-V for sure. Probably others. It looks like this support is missing from the x86_64 port. I'd suggest refiling this as a binutils bug. See for instance https://sourceware.org/bugzilla/show_bug.cgi?id=23244 for a RISC-V example of the same problem. But we need a new bug for the x86_64 problem. RISC-V has a register hard wired to zero, so I rewrite the call instruction to use x0 as the base address. The arm port turns the call into a nop.
I am not sure what to report. I do not understand the background of linker and relocations enough. Also, I don't have access to Windows and Cygwin any more.
Joel Sherrill offered to create a binutils bug report for this.