Bug 64697 - C++11 thread_local: relocation truncated to fit: R_X86_64_PC32 against undefined symbol `TLS init function for N::ptd'
Summary: C++11 thread_local: relocation truncated to fit: R_X86_64_PC32 against undefi...
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: 7.3.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL: http://stackoverflow.com/q/28023728/3...
Keywords: link-failure
Depends on:
Blocks:
 
Reported: 2015-01-20 19:16 UTC by Václav Haisman
Modified: 2020-10-20 23:02 UTC (History)
4 users (show)

See Also:
Host: x86_64-pc-cygwin
Target: x86_64-pc-cygwin
Build: x86_64-pc-cygwin
Known to work:
Known to fail: 4.8.3, 4.9.2, 5.3.0, 5.4.0, 6.3.0, 7.3.0
Last reconfirmed: 2016-02-08 00:00:00


Attachments
def.hxx (148 bytes, text/x-c++hdr)
2015-01-20 19:18 UTC, Václav Haisman
Details
def.cxx (83 bytes, text/x-c++src)
2015-01-20 19:18 UTC, Václav Haisman
Details
use.cxx (104 bytes, text/x-c++src)
2015-01-20 19:18 UTC, Václav Haisman
Details
logs requested by #5 comment (2.05 KB, text/plain)
2016-02-08 15:09 UTC, Václav Haisman
Details
logs after complete recompilation (1.13 KB, text/plain)
2016-02-08 15:16 UTC, Václav Haisman
Details
logs of compilation with -fno-lto (1.13 KB, text/plain)
2016-02-09 08:15 UTC, Václav Haisman
Details
objdump -r use.o log (438 bytes, text/plain)
2016-02-09 13:05 UTC, Václav Haisman
Details
objdump -Ttr def.o log (414 bytes, text/plain)
2016-02-09 13:10 UTC, Václav Haisman
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Václav Haisman 2015-01-20 19:16:17 UTC
I have hit an issue with thread-local storage variables on
Cygwin/AMD64, I do not see it with Cygwin/i686.

I am having linking issues when using `thread_local` keyword in Cygwin
with its GCC 4.8.3 and GCC 4.9.2. This is derived from log4cplus. The
test case is split into three files:

File def.hxx:

~~~~
#include <string>

namespace N
{
  struct S { std::string str; };
  // extern declaration in a header
  extern thread_local S * ptd;

  // accessing the extern declared ptd here
  inline
  S * get_ptd ()
  {
    if (! ptd)
      ptd = new S;
    return ptd;
  }
} // namespace N
~~~~

File def.cxx:

~~~~
#include "def.hxx"

namespace N
{
  // definition of ptd
  thread_local S * ptd = nullptr;
} // namespace N
~~~

File use.cxx:

~~~~
#include "def.hxx"

namespace N
{
  __declspec(dllexport)
  void * foo ()
  {
    // invoking inline get_ptd() function to get the value in ptd
    return get_ptd ();
  }
}
~~~~

Now, when I compile each .cxx with `g++ -std=gnu++11
-fvisibility=hidden -c use.cxx def.cxx` and then try to link with `g++
-shared -o cygtest.dll use.o def.o`, I get the following error from
linker:

~~~~
use.o:use.cxx:(.text$_ZTWN1N3ptdE[_ZTWN1N3ptdE]+0x15): relocation
truncated to fit: R_X86_64_PC32 against undefined symbol `TLS init
function for N::ptd'
collect2: error: ld returned 1 exit status
~~~~

The nm -C ./def.o output confirms that:

~~~~
`--> nm -C ./def.o
0000000000000000 b .bss
0000000000000000 d .data
0000000000000000 r .rdata
0000000000000000 r .rdata$zzz
0000000000000000 t .text
0000000000000008 r __emutls_t._ZN1N3ptdE
0000000000000000 D __emutls_v._ZN1N3ptdE
0000000000000000 r std::piecewise_construct
~~~~

As you can see, the ptd thread-local variable initialization function
is not defined anywhere. The use.o references this initialization
function (see bottom of the listing):

~~~~
`--> nm -C ./use.o
0000000000000000 b .bss
0000000000000000 d .data
0000000000000000 i .drectve
0000000000000000 p .pdata
0000000000000000 p .pdata$_ZN1N1SC1Ev
0000000000000000 p .pdata$_ZN1N7get_ptdEv
0000000000000000 p .pdata$_ZTWN1N3ptdE
0000000000000000 r .rdata
0000000000000000 r .rdata$.refptr.__emutls_v._ZN1N3ptdE
0000000000000000 r .rdata$.refptr._ZTHN1N3ptdE
0000000000000000 r .rdata$zzz
0000000000000000 R .refptr.__emutls_v._ZN1N3ptdE
0000000000000000 R .refptr._ZTHN1N3ptdE
0000000000000000 t .text
0000000000000000 t .text$_ZN1N1SC1Ev
0000000000000000 t .text$_ZN1N7get_ptdEv
0000000000000000 t .text$_ZTWN1N3ptdE
0000000000000000 A .weak._ZTHN1N3ptdE._ZN1N1SC1Ev
0000000000000000 r .xdata
0000000000000000 r .xdata$_ZN1N1SC1Ev
0000000000000000 r .xdata$_ZN1N7get_ptdEv
0000000000000000 r .xdata$_ZTWN1N3ptdE
                 U __emutls_get_address
                 U __emutls_v._ZN1N3ptdE
                 U __gxx_personality_seh0
                 U __real__ZdlPv
                 U __real__Znwm
                 U _Unwind_Resume
                 U operator delete(void*)
0000000000000000 T N::S::S()
0000000000000000 T N::foo()
0000000000000000 T N::get_ptd()
                 U std::basic_string<char, std::char_traits<char>,
std::allocator<char> >::basic_string()
                 U operator new(unsigned long)
0000000000000000 r std::piecewise_construct
                 w TLS init function for N::ptd
0000000000000000 T TLS wrapper function for N::ptd
~~~~

Now, this code seems to work well on Linux with both GCC and Clang.

Is this a GCC problem on Cygwin?
Am I using extern thread_local wrong?

My experiments show that not using the extern keyword seems to fix the
issue. But I am not sure if that does not introduce two ptd
thread-local variables in two TUs.

See also http://stackoverflow.com/q/28023728/341065
Comment 1 Václav Haisman 2015-01-20 19:18:05 UTC
Created attachment 34503 [details]
def.hxx
Comment 2 Václav Haisman 2015-01-20 19:18:29 UTC
Created attachment 34504 [details]
def.cxx
Comment 3 Václav Haisman 2015-01-20 19:18:58 UTC
Created attachment 34505 [details]
use.cxx
Comment 4 raidl 2016-02-08 07:08:35 UTC
Problem is still present in gcc 5.3.0.
Furthermore, it also appears when the thread_local variable is a static class member.
Comment 5 H.J. Lu 2016-02-08 13:27:37 UTC
(In reply to Václav Zeman from comment #0)

> use.o:use.cxx:(.text$_ZTWN1N3ptdE[_ZTWN1N3ptdE]+0x15): relocation
> truncated to fit: R_X86_64_PC32 against undefined symbol `TLS init
> function for N::ptd'
> collect2: error: ld returned 1 exit status

Cygwin doesn't use R_X86_64_PC32.  Please show us the output of

# g++ -v -shared -o cygtest.dll use.o def.o

and

# ld -V
Comment 6 Václav Haisman 2016-02-08 15:09:37 UTC
Created attachment 37630 [details]
logs requested by #5 comment

Here is the linking -v output and ld -V output.
Comment 7 Václav Haisman 2016-02-08 15:16:48 UTC
Created attachment 37631 [details]
logs after complete recompilation

logs after complete recompilation
Comment 8 H.J. Lu 2016-02-08 15:47:28 UTC
Your compiler doesn't have proper LTO support.  Please turn it
off with -fno-lto.
Comment 9 Václav Haisman 2016-02-08 16:00:19 UTC
(In reply to H.J. Lu from comment #8)
> Your compiler doesn't have proper LTO support.  Please turn it
> off with -fno-lto.

How/why is it improper?
Comment 10 H.J. Lu 2016-02-08 16:07:39 UTC
(In reply to Václav Zeman from comment #9)
> (In reply to H.J. Lu from comment #8)
> > Your compiler doesn't have proper LTO support.  Please turn it
> > off with -fno-lto.
> 
> How/why is it improper?

Your LTO generates binary files targeting Linux from LTO IR.
Comment 11 Václav Haisman 2016-02-09 08:15:30 UTC
Created attachment 37638 [details]
logs of compilation with -fno-lto

(In reply to H.J. Lu from comment #8)
> Your compiler doesn't have proper LTO support.  Please turn it
> off with -fno-lto.
Comment 12 H.J. Lu 2016-02-09 12:40:36 UTC
Please provide the output of "objdump -r use.o".
Comment 13 Václav Haisman 2016-02-09 13:05:37 UTC
Created attachment 37643 [details]
objdump -r use.o log

(In reply to H.J. Lu from comment #12)
> Please provide the output of "objdump -r use.o".
Comment 14 Václav Haisman 2016-02-09 13:10:32 UTC
Created attachment 37644 [details]
objdump -Ttr def.o log

`objdump -Ttr def.o` in advance, just in case it is relevant.
Comment 15 H.J. Lu 2016-02-09 13:17:17 UTC
I didn't realize Windows linker uses ELF relocation names.
I don't know what is wrong.
Comment 16 Václav Haisman 2016-03-18 13:04:18 UTC
And this still fails for me with GCC 5.3:

`--> ./build.sh
+ g++ -std=gnu++11 -fvisibility=hidden -c use.cxx def.cxx
+ g++ -shared -o cygtest.dll use.o def.o
use.o:use.cxx:(.text$_ZTWN1N3ptdE[_ZTWN1N3ptdE]+0x15): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `TLS init function for N::ptd'
collect2: error: ld returned 1 exit status
.-(~/log4cplus-git/tls-test-case)
`--> g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-cygwin/5.3.0/lto-wrapper.exe
Target: x86_64-pc-cygwin
Configured with: /cygdrive/i/szsz/tmpp/gcc/gcc-5.3.0-3.x86_64/src/gcc-5.3.0/configure --srcdir=/cygdrive/i/szsz/tmpp/gcc/gcc-5.3.0-3.x86_64/src/gcc-5.3.0 --prefix=/usr --exec-prefix=/usr --localstatedir=/var --sysconfdir=/etc --docdir=/usr/share/doc/gcc --htmldir=/usr/share/doc/gcc/html -C --build=x86_64-pc-cygwin --host=x86_64-pc-cygwin --target=x86_64-pc-cygwin --without-libiconv-prefix --without-libintl-prefix --libexecdir=/usr/lib --enable-shared --enable-shared-libgcc --enable-static --enable-version-specific-runtime-libs --enable-bootstrap --enable-__cxa_atexit --with-dwarf2 --with-tune=generic --enable-languages=ada,c,c++,fortran,lto,objc,obj-c++ --enable-graphite --enable-threads=posix --enable-libatomic --enable-libcilkrts --enable-libgomp --enable-libitm --enable-libquadmath --enable-libquadmath-support --enable-libssp --enable-libada --enable-libgcj-sublibs --disable-java-awt --disable-symvers --with-ecj-jar=/usr/share/java/ecj.jar --with-gnu-ld --with-gnu-as --with-cloog-include=/usr/include/cloog-isl --without-libiconv-prefix --without-libintl-prefix --with-system-zlib --enable-linker-build-id --with-default-libstdcxx-abi=gcc4-compatible
Thread model: posix
gcc version 5.3.0 (GCC)
Comment 17 Václav Haisman 2017-01-24 14:35:46 UTC
This is still an issue in 2017 with GCC 5.4.0.
Comment 18 Václav Haisman 2017-01-25 16:13:41 UTC
And I have just verified it is still the same with GCC 6.3.0.
Comment 19 Václav Haisman 2017-01-26 11:00:19 UTC
There appears to be some sort of interaction with the `inline` attribute of the `get_ptd()` function. If the `get_ptd()` function is just declared `extern` in `def.hxx` and defined in `def.cxx`, the link error goes away.
Comment 20 Václav Haisman 2018-06-26 16:40:01 UTC
Still an issue in 2018 with GCC 7.3.0.
Comment 21 Jon Turney 2020-01-08 17:18:00 UTC
I looked into this a bit, as gdb 9.0 now uses thread_local in a way which trips over this.

I came up with a slightly simpler reproduction:

$ cat def.h

extern thread_local int tlv;

$ cat def.cc

#include "def.h"

thread_local int tlv;

$ cat use.cc

#include "def.h"

int main()
{
  tlv = 1;
}

$ x86_64-pc-cygwin-gcc def.cc use.cc --save-temps
/tmp/ccMAKHhL.o:use.cc:(.text$_ZTW3tlv[_ZTW3tlv]+0x15): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `TLS init function for tlv'
collect2: error: ld returned 1 exit status

This compiles without error with x86_64-w64-mingw32-gcc.

Looking at use.s:

        .file   "use.cc"
        .text
        .section        .text$_ZTW3tlv,"x"
        .linkonce discard
        .globl  _ZTW3tlv
        .def    _ZTW3tlv;       .scl    2;      .type   32;     .endef
        .seh_proc       _ZTW3tlv
_ZTW3tlv:
.LFB1:
        pushq   %rbp
        .seh_pushreg    %rbp
        movq    %rsp, %rbp
        .seh_setframe   %rbp, 0
        subq    $32, %rsp
        .seh_stackalloc 32
        .seh_endprologue
        movq    .refptr._ZTH3tlv(%rip), %rax
        testq   %rax, %rax
        je      .L2
        call    _ZTH3tlv
.L2:
        movq    .refptr.__emutls_v.tlv(%rip), %rcx
        call    __emutls_get_address
        addq    $32, %rsp
        popq    %rbp
        ret
        .seh_endproc
        .def    __main; .scl    2;      .type   32;     .endef
        .text
        .globl  main
        .def    main;   .scl    2;      .type   32;     .endef
        .seh_proc       main
main:
.LFB0:
        pushq   %rbp
        .seh_pushreg    %rbp
        movq    %rsp, %rbp
        .seh_setframe   %rbp, 0
        subq    $32, %rsp
        .seh_stackalloc 32
        .seh_endprologue
        call    __main
        call    _ZTW3tlv
        movl    $1, (%rax)
        movl    $0, %eax
        addq    $32, %rsp
        popq    %rbp
        ret
        .seh_endproc
        .weak   _ZTH3tlv
        .ident  "GCC: (GNU) 7.4.0"
        .def    _ZTH3tlv;       .scl    2;      .type   32;     .endef
        .def    __emutls_get_address;   .scl    2;      .type   32;     .endef
        .section        .rdata$.refptr.__emutls_v.tlv, "dr"
        .globl  .refptr.__emutls_v.tlv
        .linkonce       discard
.refptr.__emutls_v.tlv:
        .quad   __emutls_v.tlv
        .section        .rdata$.refptr._ZTH3tlv, "dr"
        .globl  .refptr._ZTH3tlv
        .linkonce       discard
.refptr._ZTH3tlv:
        .quad   _ZTH3tlv

The problem seems to be in the TLS wrapper function (_ZTW3tlv):

[...]
        movq    .refptr._ZTH3tlv(%rip), %rax
        testq   %rax, %rax
        je      .L2
        call    _ZTH3tlv
[...]
        .weak   _ZTH3tlv
[...]
.refptr._ZTH3tlv:
        .quad   _ZTH3tlv

The call here is to absolute address 0 (since the weak symbol has no other defintion), which is encoded relative to %rip.  This requires a relocation, and the relative offset can't be contained in 32 signed bits, if the ImageBase is >2GB.

As some confirmation of this analysis, this problem can be shown with x86_64-w64-mingw32-gcc, if the ImageBase is altered from 0x40 0000 (the default for that) to 0x1 0040 00000 (the default for x86_64 Cygwin)

$ x86_64-w64-mingw32-gcc def.cc use.cc -Wl,--image-base,0x100400000
/tmp/cc3XRN6L.o:use.cc:(.text$_ZTW3tlv[_ZTW3tlv]+0x15): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `TLS init function for tlv'
collect2: error: ld returned 1 exit status

Naively, I think this could be fixed by generating code which indirects the call through the pseudo-reloc, but I'm not sure that makes sense.
Comment 22 Jim Wilson 2020-03-26 22:03:11 UTC
This looks like a binutils bug to me.  A call to an undefined weak function should never be executed, so it is OK for the linker to convert that call instruction into anything convenient.  There is no need for a relocation that can reach an address of zero.  We can convert the call instruction to call itself, or the next instruction, or change it to a nop, what ever is convenient, it doesn't really matter.

A number of binutils ports already have code to handle related problems.  ARM and RISC-V for sure.  Probably others.  It looks like this support is missing from the x86_64 port.  I'd suggest refiling this as a binutils bug.  See for instance
  https://sourceware.org/bugzilla/show_bug.cgi?id=23244
for a RISC-V example of the same problem.  But we need a new bug for the x86_64 problem.  RISC-V has a register hard wired to zero, so I rewrite the call instruction to use x0 as the base address.  The arm port turns the call into a nop.
Comment 23 Václav Haisman 2020-03-26 23:02:28 UTC
I am not sure what to report. I do not understand the background of linker and relocations enough. Also, I don't have access to Windows and Cygwin any more.
Comment 24 Jim Wilson 2020-03-26 23:06:10 UTC
Joel Sherrill offered to create a binutils bug report for this.