Bug 48384

Summary:	lto, linker-plugin and optimization clutter the stack trace
Product:	gcc	Reporter:	vincenzo Innocente <vincenzo.innocente>
Component:	lto	Assignee:	Not yet assigned to anyone <unassigned>
Status:	RESOLVED FIXED
Severity:	major	CC:	ccoutant, ian, iant, lat
Priority:	P3
Version:	4.6.0
Target Milestone:	---
Host:		Target:
Build:		Known to work:
Known to fail:		Last reconfirmed:
Attachments:	an header file, three compilation units, a script to source that will produce several shared libraries and executables

Description vincenzo Innocente 2011-03-31 09:58:39 UTC

Created attachment 23835 [details]
an header file, three compilation units, a script to source that will produce several shared libraries and executables

I'm testing lto and the linker-plugin within shared libraries.
Results using hidden visibility are very encouraging. Unfortunately the combination of even mild optimization (O2) and -flto -fuse-linker-plugin seems to clutter the stack-trace. This can be easily
shown in gdb. It makes also instrumentation tools, that rely on stack trace, to either crash or produce wrong results.

I'm using 
gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-unknown-linux-gnu/4.6.1/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ./configure --enable-gold=yes --enable-lto --with-fpmath=avx
Thread model: posix
gcc version 4.6.1 20110325 (prerelease) (GCC) 

GNU gold (GNU Binutils 2.21) 1.10

Linux vinavx0.cern.ch 2.6.32-71.14.1.el6.x86_64 #1 SMP Thu Jan 13 12:03:40 CET 2011 x86_64 x86_64 x86_64 GNU/Linux

glibc.x86_64                           2.12-1.7.el6_0.4  
GNU gdb (GDB) Red Hat Enterprise Linux (7.1-29.el6_0.1)


In  the attachment there are the four  files of my simple test (a long loop and a seg-fault)
and a script that builds various versions
just compare
g++ -g -DHIDDEN go.cc foo.cc -flto -fuse-linker-plugin -fPIC -shared -o libfoo_hltog.so
g++ -g -DHIDDEN main.cc -flto -fuse-linker-plugin -L./ -lfoo_hltog -o t_hltog
with
g++ -O2 -g -DHIDDEN go.cc foo.cc -flto -fuse-linker-plugin -fPIC -shared -o libfoo_hltog2.so
g++ -O2 -g -DHIDDEN main.cc -flto -fuse-linker-plugin -L./ -lfoo_hltog2 -o t_hltog2

the first looks ok,
(the segmentation fault deferencing a zero pointer is intentional)
the latter in gdb will produce
(gdb) run
Starting program: /afs/cern.ch/user/i/innocent/public/ctest/lto/t_hltog2 
Program received signal SIGSEGV, Segmentation fault.
go (j=Cannot access memory at address 0x0
) at go.cc:5
5	  j+= foo(h);
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.7.el6_0.4.x86_64
(gdb) where
#0  go (j=Cannot access memory at address 0x0
) at go.cc:5
#1  0x00000000000003e8 in ?? ()
#2  0x42c800004232ee1f in ?? ()
#3  0x0000000000000000 in ?? ()
(gdb) run 2
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /afs/cern.ch/user/i/innocent/public/ctest/lto/t_hltog2 2
^C
Program received signal SIGINT, Interrupt.
0x0000003ff6207ebd in __ieee754_asin () from /lib64/libm.so.6
(gdb) where
#0  0x0000003ff6207ebd in __ieee754_asin () from /lib64/libm.so.6
#1  0x0000003ff6224842 in asin () from /lib64/libm.so.6
#2  0x00007ffff7ffb6aa in bar (j=20000001, h=0x0) at foo.cc:14
#3  go (j=20000001, h=0x0) at go.cc:4
#4  0x0000000000989680 in ?? ()
#5  0x3f8000003f800001 in ?? ()
#6  0x0000000000000000 in ?? ()
(gdb) 


in more complex applications with multiple shared libraries things gets much more confused with "??"
all over the srack-trace

Comment 1 Richard Biener 2011-03-31 10:28:07 UTC

Hm.  I get

> LD_LIBRARY_PATH=. gdb ./t_hltog
GNU gdb (GDB) SUSE (7.2.50.20110206-67.1)
...
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7ff87eb in foo (h=0x0) at foo.cc:6
6         return *h +1;
(gdb) bt
#0  0x00007ffff7ff87eb in foo (h=0x0) at foo.cc:6
#1  0x00007ffff7ff8709 in go (j=2001, h=0x0) at go.cc:5
#2  0x000000000040063d in main (argc=1) at main.cc:7

> LD_LIBRARY_PATH=. gdb ./t_hltog2
GNU gdb (GDB) SUSE (7.2.50.20110206-67.1)
...
Program received signal SIGSEGV, Segmentation fault.
go (j=Cannot access memory at address 0x0
) at go.cc:5
5         j+= foo(h);
(gdb) bt
#0  go (j=Cannot access memory at address 0x0
) at go.cc:5
#1  0x000000000040062d in main (argc=<optimized out>) at main.cc:7

> LD_LIBRARY_PATH=. gdb ./t_hlto
GNU gdb (GDB) SUSE (7.2.50.20110206-67.1)
...
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7ff874b in go(int, int const*) () from ./libfoo_hlto.so
(gdb) bt
#0  0x00007ffff7ff874b in go(int, int const*) () from ./libfoo_hlto.so
#1  0x000000000040062d in main ()


which looks reasonable.

As I can't reproduce the issue at hand I didn't try to investigate
what debuginfo parts may be off.  The assembly of the shared library
shows that foo is inlined into go which appearantly didn't happen for
you?  I don't have any foo symbol in the shared library.  So maybe
gdb is confused by bogus debug information about inlines.

Comment 2 vincenzo Innocente 2011-03-31 10:39:36 UTC

Thanks,
at least on some system it seems to behave correclty.
I had to rebuild binutil to enable the plugin and something may have gone wrong
are you using standard bfd  "ld" or  gold?

vincenzo

On 31 Mar, 2011, at 12:28 PM, rguenth at gcc dot gnu.org wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48384
> 
> --- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-03-31 10:28:07 UTC ---
> Hm.  I get
> 
>> LD_LIBRARY_PATH=. gdb ./t_hltog
> GNU gdb (GDB) SUSE (7.2.50.20110206-67.1)
> ...
> Program received signal SIGSEGV, Segmentation fault.
> 0x00007ffff7ff87eb in foo (h=0x0) at foo.cc:6
> 6         return *h +1;
> (gdb) bt
> #0  0x00007ffff7ff87eb in foo (h=0x0) at foo.cc:6
> #1  0x00007ffff7ff8709 in go (j=2001, h=0x0) at go.cc:5
> #2  0x000000000040063d in main (argc=1) at main.cc:7
> 
>> LD_LIBRARY_PATH=. gdb ./t_hltog2
> GNU gdb (GDB) SUSE (7.2.50.20110206-67.1)
> ...
> Program received signal SIGSEGV, Segmentation fault.
> go (j=Cannot access memory at address 0x0
> ) at go.cc:5
> 5         j+= foo(h);
> (gdb) bt
> #0  go (j=Cannot access memory at address 0x0
> ) at go.cc:5
> #1  0x000000000040062d in main (argc=<optimized out>) at main.cc:7
> 
>> LD_LIBRARY_PATH=. gdb ./t_hlto
> GNU gdb (GDB) SUSE (7.2.50.20110206-67.1)
> ...
> Program received signal SIGSEGV, Segmentation fault.
> 0x00007ffff7ff874b in go(int, int const*) () from ./libfoo_hlto.so
> (gdb) bt
> #0  0x00007ffff7ff874b in go(int, int const*) () from ./libfoo_hlto.so
> #1  0x000000000040062d in main ()
> 
> 
> which looks reasonable.
> 
> As I can't reproduce the issue at hand I didn't try to investigate
> what debuginfo parts may be off.  The assembly of the shared library
> shows that foo is inlined into go which appearantly didn't happen for
> you?  I don't have any foo symbol in the shared library.  So maybe
> gdb is confused by bogus debug information about inlines.
> 
> -- 
> Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You reported the bug.

--
Il est bon de suivre sa pente, pourvu que ce soit en montant. 
A.G.
http://www.flickr.com/photos/vin60/1320965757/

Comment 3 Richard Biener 2011-03-31 10:43:49 UTC

(In reply to comment #2)
> Thanks,
> at least on some system it seems to behave correclty.
> I had to rebuild binutil to enable the plugin and something may have gone wrong
> are you using standard bfd  "ld" or  gold?

I'm using standard bfd ld, version 2.21.

Richard.

Comment 4 vincenzo Innocente 2011-03-31 11:05:47 UTC

its' a gold effect!
with 
ld -v
GNU ld (GNU Binutils) 2.21
things are fine
(gdb) run
Starting program: /afs/cern.ch/user/i/innocent/public/ctest/lto/t_hlto 
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7dfc78d in go(int, int const*) () from ./libfoo_hlto.so
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.7.el6_0.4.x86_64
(gdb) where
#0  0x00007ffff7dfc78d in go(int, int const*) () from ./libfoo_hlto.so
#1  0x000000000040050d in main ()

I will stick to " bfd ld" for the time being. 
not sure on which side should be 
gold + gcc-lto-plugin 
debug.
gold is written in c++ which may mean that  bootstrap in the installation is needed

thanks for the fast feed back,

vincenzo

On 31 Mar, 2011, at 12:43 PM, rguenth at gcc dot gnu.org wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48384
> 
> --- Comment #3 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-03-31 10:43:49 UTC ---
> (In reply to comment #2)
>> Thanks,
>> at least on some system it seems to behave correclty.
>> I had to rebuild binutil to enable the plugin and something may have gone wrong
>> are you using standard bfd  "ld" or  gold?
> 
> I'm using standard bfd ld, version 2.21.
> 
> Richard.
> 
> -- 
> Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You reported the bug.

Comment 5 Paolo Carlini 2011-05-11 21:14:50 UTC

Ian, could you have a look to this issue (in particular Comment #4)? Should Vincenzo file a GOLD (binutils) PR? Thanks in advance.

Comment 6 Lassi Tuura 2011-05-14 15:41:35 UTC

I think this is an issue I previous looked at for Vincenzo, and concluded that LTO-generated binary has unwind info referring to the wrong ELF base section, .eh_frame, rather than .text, completely messing up any attempt to unwind. I don't have immediate access to the exact tool chain used for this bug report so can't replicate the necessary info right now, but I've attached a previous analysis I sent to Vincenzo.

If this isn't sufficient to put investigation on the right track, I'll get together with Vincenzo next week and we will generate the necessary unwind info dumps for the specific test case he added here before.

===

At any rate the unwind info is completely messed up. For example from "readelf -Wwf libfoo_hlto.so" we have a claim there is a function at pc=000007a0..0000086f.

00000020 fffffea8 0000001c FDE cie=00000008 pc=000007a0..0000086f
 DW_CFA_advance_loc: 2 to 000007a2
 DW_CFA_def_cfa_offset: 16
 DW_CFA_offset: r12 (r12) at cfa-16
 DW_CFA_advance_loc: 4 to 000007a6
 DW_CFA_def_cfa_offset: 24
 [...]

But the only function of any significance in that library is not at that address, as shown by "objdump -d libfoo_hlto.so". The FDE program looks correct otherwise, just is at wrong address.

0000000000000640 <_Z2goiPKi>:
640:   41 54                   push   %r12
642:   49 89 f4                mov    %rsi,%r12
645:   55                      push   %rbp
646:   8d 6c 3f 01             lea    0x1(%rdi,%rdi,1),%ebp
64a:   53                      push   %rbx
[...]
6ff:   5b                      pop    %rbx
700:   5d                      pop    %rbp
701:   41 5c                   pop    %r12
703:   c3                      retq   
704:   e8 57 fe ff ff          callq  560 <sqrt@plt>
709:   c5 f9 28 c8             vmovapd %xmm0,%xmm1
70d:   eb b9                   jmp    6c8 <_Z2goiPKi+0x88>
70f:   90                      nop

In fact there's nothing at all at the CIE range 000007a0..0000086f - the address is in unwind info table itself (.eh_frame), not in the code. So it looks like whatever is generating this unwind info is generating wrong address references.

$ readelf -WS libfoo_hlto.so
There are 29 section headers, starting at offset 0x1100:

Section Headers:
 [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
 [...]
 [14] .eh_frame         PROGBITS        0000000000000778 000778 000058 00   A  0   0  8
 [15] .eh_frame_hdr     PROGBITS        00000000000007d0 0007d0 000014 00   A  0   0  4
 [...]

Looks like your tool chain is broken. I don't know why.

Comment 7 Ian Lance Taylor 2011-06-30 00:53:59 UTC

I just committed a fix for this problem in the gold development sources.