This is the mail archive of the
gcc-prs@gcc.gnu.org
mailing list for the GCC project.
Re: target/8004: All C++ binaries crash in __register_frame_info_baseson Sparc Solaris 2.7
- From: "Aaron Williams" <aaron_williams at net dot com>
- To: nobody at gcc dot gnu dot org
- Cc: gcc-prs at gcc dot gnu dot org,
- Date: 2 Oct 2002 04:16:01 -0000
- Subject: Re: target/8004: All C++ binaries crash in __register_frame_info_baseson Sparc Solaris 2.7
- Reply-to: "Aaron Williams" <aaron_williams at net dot com>
The following reply was made to PR target/8004; it has been noted by GNATS.
From: "Aaron Williams" <aaron_williams@net.com>
To: davem@gcc.gnu.org, gcc-bugs@gcc.gnu.org, gcc-prs@gcc.gnu.org,
nobody@gcc.gnu.org, gcc-gnats@gcc.gnu.org
Cc:
Subject: Re: target/8004: All C++ binaries crash in __register_frame_info_bases
on Sparc Solaris 2.7
Date: Tue, 01 Oct 2002 21:13:03 -0700
I should have all the required patches installed. I have Sun's patch
cluster as of 9/11 installed. I believe this may be due to a bug in
ld.so. I am attaching a copy of an email I received from someone else
who appears to have the same problem. My current workaround is to use
Sun's /usr/ccs/bin/ld instead of the one from binutils 2.13.
I am having other stability problems with gcc 3.2 on Solaris and will
likely go back to 2.95.3. Konqueror in KDE 3.0.3 and qt-3.0.5 compiled
with gcc 3.2 is unstable, for example. I was hoping 3.2 would fix a
problem where I see static destructors being called in a shared library
when the shared library is no longer present (causing a crash in the
exit handler). This too, unfortunately sound like it might be a Solaris
bug.
-Aaron
Email follows:
Dear Aaron Williams,
>> After searching the web regarding a problem I am having with GCC 3.2 on
>> Solaris I came across your bug report at :
>>
>> http://www.geocrawler.com/lists/3/GNU/361/0/9566991/
>>
>> I am experiencing exactly the same problem but with Solaris 2.7. I was
>> wondering if you were successful in resolving this problem and if so how you
>> did it?
>
>
one of my colleagues, Christian Ehrhardt <ehrhardt@mathematik.uni-ulm.de>
analyzed this problem further on and he believes that it is a bug of
ld.so.1. Here is his report:
>> The dynamic runtime linker fails to relocate valid shared libraries
>> generated by recent versions of GNU-ld. /usr/local/bin/ld is from
>> the GNU binutils-2.13 package:
>>
>> turing$ /usr/local/bin/ld -v
>> GNU ld version 2.13
>>
>> How to reproduce:
>>
>> Script started on Fri Sep 20 19:46:43 2002
>> turing$ cat t2.c
>> struct object {
>> int i;
>> int j;
>> int k;
>> int l;
>> };
>>
>>
>>
>> int func ()
>> {
>> static struct object x;
>> struct object * p;
>> p = &x;
>> p->i = 3;
>> return 0;
>> }
>>
>> turing$ cat t3.c
>> extern int func();
>>
>> int main ()
>> {
>> func();
>> return 0;
>> }
>> turing$ cat Makefile.sun
>> .PHONY: clean
>> all: a.out
>> t2.o: t2.c
>> CC -c -KPIC t2.c
>> libt2.so: t2.o
>> /usr/local/bin/ld -G t2.o -olibt2.so
>> t3.o: t3.c
>> CC -c t3.c
>> a.out: libt2.so t3.o
>> CC -lt2 t3.o -L. -R.
>> clean:
>> rm -f *.so *.o a.out
>>
>> turing$ cat Makefile
>> .PHONY: clean
>> all: a.out
>> t2.o: t2.c
>> gcc -c -fPIC t2.c
>> libt2.so: t2.o
>> /usr/local/bin/ld -nostdlib -shared -olibt2.so t2.o
>> a.out: libt2.so t3.c
>> gcc -nostdlib t3.c libt2.so -L. -R.
>> clean:
>> rm -f *.so *.o a.out core
>>
>> turing$ make -f Makefile.sun clean
>> rm -f *.so *.o a.out
>> turing$ make -f Makefile.sun
>> CC -c -KPIC t2.c
>> /usr/local/bin/ld -G t2.o -olibt2.so
>> CC -c t3.c
>> CC -lt2 t3.o -L. -R.
>> turing$ a.out
>> Segmentation Fault (core dumped)
>> turing$ exit
>>
>> script done on Fri Sep 20 19:47:32 2002
>>
>> Note that I compiled everything with /opt/SUNCspro/bin/CC to
>> rule out bugs in gcc. This problem can be reproduced using
>> the second Makefile and gcc with an even smaller executable.
>>
>>
>> Analyzing the core shows the following:
>> turing$ pmap core | grep libt2.so
>> FF370000 8K read/exec libt2.so
>> FF380000 8K read/write/exec libt2.so
>>
>> Script started on Fri Sep 20 19:53:10 2002
>> turing$ gdb a.out core
>> GNU gdb 5.0
>> [ ... ]
>> #0 0xff370318 in __1cEfunc6F_i_ ()
>> from /home/thales/ehrhardt/ld.so.1-bug/./libt2.so
>> (gdb) disass
>> Dump of assembler code for function __1cEfunc6F_i_:
>> 0xff3702e0 <__1cEfunc6F_i_>: save %sp, -112, %sp
>> 0xff3702e4 <__1cEfunc6F_i_+4>: call 0xff3702ec <__1cEfunc6F_i_+12>
>> 0xff3702e8 <__1cEfunc6F_i_+8>: sethi %hi(0), %o1
>> 0xff3702ec <__1cEfunc6F_i_+12>: mov %o1, %o1 ! 0x0
>> 0xff3702f0 <__1cEfunc6F_i_+16>: add %o7, %o1, %o1
>> 0xff3702f4 <__1cEfunc6F_i_+20>: st %o1, [ %fp + -12 ]
>> 0xff3702f8 <__1cEfunc6F_i_+24>: sethi %hi(0x10000), %o0
>> 0xff3702fc <__1cEfunc6F_i_+28>: or %o0, 0xc4, %o0 ! 0x100c4
>> 0xff370300 <__1cEfunc6F_i_+32>: add %o1, %o0, %l7
>> 0xff370304 <__1cEfunc6F_i_+36>: sethi %hi(0), %g1
>> 0xff370308 <__1cEfunc6F_i_+40>: or %g1, 4, %g1 ! 0x4
>> 0xff37030c <__1cEfunc6F_i_+44>: ld [ %l7 + %g1 ], %o0
>> 0xff370310 <__1cEfunc6F_i_+48>: st %o0, [ %fp + -8 ]
>> 0xff370314 <__1cEfunc6F_i_+52>: mov 3, %o1
>> 0xff370318 <__1cEfunc6F_i_+56>: st %o1, [ %o0 ]
>> 0xff37031c <__1cEfunc6F_i_+60>: clr [ %fp + -4 ]
>> 0xff370320 <__1cEfunc6F_i_+64>: mov %g0, %i0
>> 0xff370324 <__1cEfunc6F_i_+68>: ret
>> 0xff370328 <__1cEfunc6F_i_+72>: restore
>> 0xff37032c <__1cEfunc6F_i_+76>: mov %g0, %i0
>> 0xff370330 <__1cEfunc6F_i_+80>: ret
>> 0xff370334 <__1cEfunc6F_i_+84>: restore
>> ---Type <return> to continue, or q <return> to quit---
>> End of assembler dump.
>> (gdb) bt
>> #0 0xff370318 in __1cEfunc6F_i_ ()
>> from /home/thales/ehrhardt/ld.so.1-bug/./libt2.so
>> #1 0x10884 in main ()
>> (gdb) info reg o0
>> o0 0xff370000 -13172736
>> (gdb) info reg o1
>> o1 0x3 3
>> (gdb) info reg l7
>> l7 0xff3803a8 -13106264
>> (gdb) info reg g1
>> g1 0x4 4
>> (gdb) turing$ exit
>>
>> script done on Fri Sep 20 19:54:46 2002
>>
>> Looking back at function func from t2.c shows:
>> int func ()
>> {
>> static struct object x;
>> struct object * p;
>> p = &x;
>> p->i = 3; <====== crash is here.
>> return 0;
>> }
>>
>> The value of the pointer p is obviously in register o0, i.e. it is
>> 0xff370000. This is precisely the BASE address where the shared library
>> libt2.so has been mapped to. Register l7 contains the base address of
>> the .got section (the global offset table of this library). The
>> questionable address is loaded from offset 4 into the global offset table.
>>
>> Looking at the contents of the global offset table in the shared
>> library shows the following:
>>
>> turing$ elfdump -G libt2.so
>>
>> Global Offset Table: 2 entries
>> ndx addr value reloc addend symbol
>> [00000] 000103a8 00010338 R_SPARC_NONE 00000000
>> [00001] 000103ac 000103b0 R_SPARC_RELATIVE 00000000
>> turing$
>>
>> Note that we have indeed
>> %l7(0xff3803a8) = Offset of .got(0x000103a8) + library base address(0xFF370000)
>>
>> The Solaris Linker and Libraries Guide (freshly downloaded from
>> docs.sun.com) hast this explanation about R_SPARC_RELATIVE:
>>
>> |Some relocation types have semantics beyond simple calculation:
>> |[ ... ]
>> |R_SPARC_RELATIVE
>> | Created by the link-editor for dynamic objects. Its offset member
>> | gives the location within a shared object that contains a value
>> | representing a relative address. The runtime linker computes the
>> | corresponding virtual address by adding the virtual address at which
>> | the shared object is loaded to the relative address. Relocation
>> | entries for this type must specify 0 for the symbol table index.
>>
>> This means that the value at offset 0x4 in the global offset
>> Table should be
>> library base address + Value in .got
>> 0xFF370000 + 0x000103B0 = 0xFF3803B0
>> after relocation. However looking at the value of register o0 we
>> see that the .got section obviously contains the value 0xFF37B000
>> instead.
>>
>> Checking the source code of the /usr/lib/ld.so.1 from Solaris 7 (the
>> latest that we currently have access to) I found the following
>> concerning R_SPARC_RELATIVE relocations.
>>
>> os_net/src_ws/usr/src/cmd/sgs/rtld/sparc/sparc_elf.c function elf_reloc:
>> | if ((rtype == R_SPARC_RELATIVE) &&
>> | !(FLAGS(lmp) & FLG_RT_FIXED) && !dbg_mask) {
>> | if (relacount) {
>> | relbgn = elf_reloc_relacount(relbgn, relacount,
>> | relsiz, basebgn);
>> |
>> | relacount = 0;
>> | } else
>> | relbgn = elf_reloc_relative(relbgn, relend,
>> | relsiz, basebgn, etext, emap);
>> | if (relbgn >= relend)
>> | break;
>> | rtype = ELF_R_TYPE(((Rel *)relbgn)->r_info);
>> | }
>>
>> i.e. there are two functions that may be called to perform an
>> R_SPARC_RELATIVE relocation, elf_reloc_relacount or elf_reloc_relative.
>>
>> However, these function do fundamentally different things to resolve
>> these relocations:
>>
>> elf_reloc_relative (in file common_sparc.c) does the following:
>>
>> | /*
>> | * Perform the actual relocation.
>> | */
>> | *((ulong_t *) roffset) +=
>> | basebgn + (long)(((Rel *)relbgn)->r_addend);
>>
>> whereas elf_reloc_relacount (in file common_sparc.c) does this:
>>
>> | /*
>> | * Perform the actual relocation.
>> | */
>> | *((ulong_t *) roffset) =
>> | basebgn + (long)(((Rel *)relbgn)->r_addend);
>>
>> Note the assignment (``='') instead of the addition ``+=''.
>> I highly suspect that changing this will fix the problem.
>
>
Regards, Andreas Borchert.
-- Andreas Borchert, Universitaet Ulm, SAI, Helmholtzstr. 18, 89069 Ulm,
Germany E-Mail: borchert@mathematik.uni-ulm.de WWW:
http://www.mathematik.uni-ulm.de/sai/borchert/ PGP:
http://www.mathematik.uni-ulm.de/sai/borchert/pgp.html
davem@gcc.gnu.org wrote:
>Synopsis: All C++ binaries crash in __register_frame_info_bases on Sparc Solaris 2.7
>
>State-Changed-From-To: open->feedback
>State-Changed-By: davem
>State-Changed-When: Tue Oct 1 20:59:26 2002
>State-Changed-Why:
> Do you have all the fixed installed which are mentioned
> in:
>
> http://gcc.gnu.org/install/specific.html#sparc-sun-solaris2.7
>
> These are necessary to get gcc working on 2.7
>
>http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=8004
>
>