seen with 4.4.1 and trunk, 4.4.1 cannot be used anymore to bootstrap OpenJDK: Matthias Executing on host: /home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/libjava/testsuite/../libtool --silent --tag=GCJ --mode=link /home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc/gcj -B/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc/ --encoding=UTF-8 -B/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/libjava/testsuite/../ /home/doko/gcc/4.4/gcj-4.4-4.4.1/src/libjava/testsuite/libjava.lang/StackTrace2.jar -w -specs=libgcj-test.spec -no-install --main=StackTrace2 -g -L/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/./libjava/.libs -lm -o /home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/libjava/testsuite/StackTrace2.exe (timeout = 600) PASS: StackTrace2 compilation from source set_ld_library_path_env_vars: ld_library_path=.:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/./libjava/.libs:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc invoke: /home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/libjava/testsuite/StackTrace2.exe Setting LD_LIBRARY_PATH to .:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/./libjava/.libs:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc:.:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/./libjava/.libs:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc Trace length = 9 FAIL - expected StackTrace2$Inner, got: java.lang.Throwable.FAIL - expected doCrash, got: <init>:FAIL - expected 33, got: 161, in file Throwable.java FAIL - expected StackTrace2$Inner, got: java.lang.Throwable.FAIL - expected foo, got: <init>:FAIL - expected 28, got: 149, in file Throwable.java FAIL - expected StackTrace2, got: java.lang.Throwable.FAIL - expected a, got: <init>:FAIL - expected 21, got: 66, in file Exception.java FAIL - expected StackTrace2, got: java.lang.Throwable.FAIL - expected main, got: <init>:FAIL - expected 10, got: 64, in file RuntimeException.java PASS: StackTrace2 execution - source compiled test FAIL: StackTrace2 output - source compiled test Executing on host: /home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/libjava/testsuite/../libtool --silent --tag=GCJ --mode=link /home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc/gcj -B/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc/ --encoding=UTF-8 -B/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/libjava/testsuite/../ /home/doko/gcc/4.4/gcj-4.4-4.4.1/src/libjava/testsuite/libjava.lang/Throw_2.jar -w -specs=libgcj-test.spec -no-install --main=Throw_2 -g -L/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/./libjava/.libs -lm -o /home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/libjava/testsuite/Throw_2.exe (timeout = 600) PASS: Throw_2 compilation from source set_ld_library_path_env_vars: ld_library_path=.:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/./libjava/.libs:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc invoke: /home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/libjava/testsuite/Throw_2.exe Setting LD_LIBRARY_PATH to .:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/./libjava/.libs:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc:.:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/./libjava/.libs:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc 1 FAIL: Throw_2 execution - source compiled test UNTESTED: Throw_2 output - source compiled test Executing on host: /home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/libjava/testsuite/../libtool --silent --tag=GCJ --mode=link /home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc/gcj -B/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc/ --encoding=UTF-8 -B/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/libjava/testsuite/../ /home/doko/gcc/4.4/gcj-4.4-4.4.1/src/libjava/testsuite/libjava.lang/Throw_3.jar -w -specs=libgcj-test.spec -no-install --main=Throw_3 -g -L/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/./libjava/.libs -lm -o /home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/libjava/testsuite/Throw_3.exe (timeout = 600) PASS: Throw_3 compilation from source set_ld_library_path_env_vars: ld_library_path=.:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/./libjava/.libs:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc invoke: /home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/libjava/testsuite/Throw_3.exe Setting LD_LIBRARY_PATH to .:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/./libjava/.libs:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc:.:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/./libjava/.libs:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc bad PASS: Throw_3 execution - source compiled test FAIL: Throw_3 output - source compiled test Executing on host: /home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/libjava/testsuite/../libtool --silent --tag=GCJ --mode=link /home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc/gcj -B/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc/ --encoding=UTF-8 -B/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/libjava/testsuite/../ /home/doko/gcc/4.4/gcj-4.4-4.4.1/src/libjava/testsuite/libjava.lang/stacktrace.jar -w -specs=libgcj-test.spec -no-install --main=stacktrace -O3 -findirect-dispatch -g -L/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/./libjava/.libs -lm -o /home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/libjava/testsuite/stacktrace.exe (timeout = 600) PASS: stacktrace -O3 -findirect-dispatch compilation from source set_ld_library_path_env_vars: ld_library_path=.:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/./libjava/.libs:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc invoke: /home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/libjava/testsuite/stacktrace.exe Setting LD_LIBRARY_PATH to .:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/./libjava/.libs:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc:.:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/./libjava/.libs:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc stacktrace.c stacktrace.b stacktrace.a stacktrace.main PASS: stacktrace -O3 -findirect-dispatch execution - source compiled test FAIL: stacktrace -O3 -findirect-dispatch output - source compiled test
I'm now seeing failures in StackTrace2 and Throw_3 on arm-linux-gnueabi with gcc-4.3 branch which I didn't use to see. gcc-4.4 branch doesn't fail for me on these two, but both 4.4 and 4.3 fail (as always) on Throw_2.
I had very recently updated binutils from 2.19.1 to 2.20 + branch updates + some fixes from trunk. Going back to binutils 2.19.1 while changing nothing else eliminated the StackTrace2 and Throw_3 failures from gcc-4.3. My last gcc-4.4 build and test suite run was probably still with binutils-2.19.1, which might explain why I didn't see those errors there.
gcc-4.4.3 with binutils-2.20 fails StackTrace2 and Throw_3, reverting to binutils-2.19.1 eliminated those two failures. Bootstrapping with binutils-2.19.1 but running the test suite with binutils-2.20 did not trigger the failures. Rebuilding just libjava with binutils-2.20 also did not trigger the failures.
Further experiments show that the failures are triggered by using the binutils-2.20 version of ld when bootstrapping.
Same failures when bootstrapped with binutils-2.20.51.20100202. So there's no fix on binutils trunk to be identified and backported. I'll try to identify the point where the regression was introduced instead.
A bisection through the binutils weekly and daily snapshots has identified 090505 as the last good snapshot and 090506 as the first bad one.
this seems to be the patch for the ARM unwind table linker processing? if yes, see http://sourceware.org/bugzilla/show_bug.cgi?id=10695 for some comments.
I've identified <http://sourceware.org/ml/binutils-cvs/2009-05/msg00021.html> as the source of this regression.
confirmed with binutils trunk 20100216, checking with gold from the trunk.
The binutils patch that broke libjava came from CodeSourcery, so I decided to test their latest G++ lite for arm-linux-gnueabi (2009q3-67). That compiler is based on gcc-4.4.1, and it does not have the libjava regressions with current binutils that we see with FSF gcc. Looking at the source diffs and changelogs, I've identified four distinct unwind-related changes in their gcc, two of which are ARM-specific. I'll test them on FSF 4.4 next.
A recent 4.4 with the four unwind-related fixes I identified in CodeSourcery G++ lite 2009q3-67 applied still regresses in libjava as before with recent binutils.
I can't duplicate this problem with gcc trunk and binutils 2.20-0ubuntu2.
With binutils-2.20.51.20100223, gcc-4.5-20090327 regresses libjava as described here but gcc-4.5-20100225 does not. Starting bisection on trunk (with 24+ hour build/test cycles, that's going to take a while).
The bug was fixed for 4.5 by r148072: 2009-06-02 Richard Earnshaw <rearnsha@arm.com> * arm.c (arm_get_frame_offsets): Prefer using r3 for padding a push/pop multiple to 8-byte alignment. That change applies cleanly to 4.4 and fixes the bug there as well, although I've only done a smallish non-bootstrap build and libjava-only test with it. I'll do a full bootstrap and regression test next. This change is included in CodeSourcery's 4.4-based G++ lite, which explains why their compiler works with the newer binutils. Looking at the patch submission http://gcc.gnu.org/ml/gcc-patches/2009-06/msg00116.html I see no indication that this should have any effect on exception handling or stack unwinding, but obviously it does. I'm adding Richard E. to the cc: list.
(In reply to comment #14) > The bug was fixed for 4.5 by r148072: > > 2009-06-02 Richard Earnshaw <rearnsha@arm.com> > > * arm.c (arm_get_frame_offsets): Prefer using r3 for padding a > push/pop multiple to 8-byte alignment. > I doubt that's really fixed the bug. More likely it's caused it to go latent by changing the unwind information in some subtle way.
this change doesn't make any change to the libjava test results (configured --with-mode=thumb).
I decided to attack things from the binutils side. The problematic binutils change rolls two distinct changes into one (add cantunwind table entries, merge adjacent equivalent table entries). Disabling the "merge" part eliminated the libjava regressions in my ARM mode build: --- binutils-2.20.51/bfd/elf32-arm.c.~1~ +++ binutils-2.20.51/bfd/elf32-arm.c @@ -9301,10 +9306,12 @@ elf32_arm_fix_exidx_coverage (asection * if (elide) { +#if 0 add_unwind_table_edit (&unwind_edit_head, &unwind_edit_tail, DELETE_EXIDX_ENTRY, NULL, j / 8); deleted_exidx_bytes += 8; +#endif } last_unwind_type = unwind_type; I'll keep experimenting.
rebuilding binutils 2.20.1 with the patch from comment #17 shows one regression: Running /home/doko/binutils/binutils-2.20.1/ld/testsuite/ld-arm/arm-elf.exp ... FAIL: ld-arm/unwind-4
(In reply to comment #18) > rebuilding binutils 2.20.1 with the patch from comment #17 shows one > regression: > > Running /home/doko/binutils/binutils-2.20.1/ld/testsuite/ld-arm/arm-elf.exp ... > FAIL: ld-arm/unwind-4 I guess I should've run the binutils testsuite first. (Looking..) Ah, but that failure is Ok, as it's checking for the merging that the patch disabled.
no change in the libjava testsuite when built with these binutils
(In reply to comment #20) > no change in the libjava testsuite when built with these binutils But that's still thumb not arm like in comment #16? All my results are from plain arm (armv5tel) builds.
I did another binutils experiment. I reverted my patch to disable general merging of table entries, and instead disabled generating new and merging cantunwind entries. With that binutils libjava regressed just like with vanilla post-2.19.1 binutils. The general merging bit seems to be the problem. --- binutils-2.20.51/bfd/elf32-arm.c.~1~ +++ binutils-2.20.51/bfd/elf32-arm.c @@ -9148,6 +9148,7 @@ adjust_exidx_size(asection *exidx_sec, i bfd_set_section_size (out_sec->owner, out_sec, out_sec->size +adjust); } +#if 0 /* Insert an EXIDX_CANTUNWIND marker at the end of a section. */ static void insert_cantunwind_after(asection *text_sec, asection *exidx_sec) @@ -9162,6 +9163,7 @@ insert_cantunwind_after(asection *text_s adjust_exidx_size(exidx_sec, 8); } +#endif /* Scan .ARM.exidx tables, and create a list describing edits which should be made to those tables, such that: @@ -9248,7 +9250,9 @@ elf32_arm_fix_exidx_coverage (asection * if (sec->size == 0) continue; +#if 0 insert_cantunwind_after(last_text_sec, last_exidx_sec); +#endif last_unwind_type = 0; continue; } @@ -9282,8 +9286,10 @@ elf32_arm_fix_exidx_coverage (asection * /* An EXIDX_CANTUNWIND entry. */ if (second_word == 1) { +#if 0 if (last_unwind_type == 0) elide = 1; +#endif unwind_type = 0; } /* Inlined unwinding data. Merge if equal to previous. */ @@ -9326,8 +9332,10 @@ elf32_arm_fix_exidx_coverage (asection * } /* Add terminating CANTUNWIND entry. */ +#if 0 if (last_exidx_sec && last_unwind_type != 0) insert_cantunwind_after(last_text_sec, last_exidx_sec); +#endif return TRUE; }
While working on another debugging patch I noticed something that I think might explain the libjava regressions, especially the stack trace ones. The binutils change breaks _Unwind_GetRegionStart(). Example: We have three functions, f1() starts at address 100 and is 20 bytes, f2() starts at address 120 and is 10 bytes, and f3() starts at address 130. f1() and f2() got the same inlined unwind table data U1, while f3() got some different unwind table data U3. A pre-binutils-2.20 .ARM.exidx table for this program would be [0] = { 100, U1 }, [1] = { 120, U1 }, [2] = { 130, U3 }, ... binutils-2.20 and later merge entry [1] with entry [0] since they are adjacent and have the same unwind table data, producing a final .ARM.exidx table: [0] = { 100, U1 }, [1] = { 130, U3 }, ... So an exception in or unwind through f2() will now return { 100, U1 } rather than { 120, U1 }. The first thing that happens with the found table entry is that the function start address is stashed away (unwind-arm.c:get_eit_entry(), the assignment to ucbp->pr_cache.fnstart). Then __gnu_unwind_pr_common() uses fnstart while interpreting the unwind table data; I haven't studied that code but since libstdc++ test cases don't regress I assume the incorrect fnstart doesn't matter for inlined unwind table data. However, pr-support.c:_Unwind_GetRegionStart() also uses fnstart, so users of _Unwind_GetRegionStart() will see f1() not f2() for exceptions in or unwinds through f2(). In particular, libjava/stacktrace.cc looks like it will construct bogus stack traces (see its uses of 'start_ip').
This rings a bell and the comment in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29206#c8 is probably related. If so, this is essentially a dup of PR29206 and I'm not sure what more we can do with this. cheers Ramana
(In reply to comment #24) > This rings a bell and the comment in > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29206#c8 > is probably related. I don't think this is at all the same as PR29206. PR29206 is very old now, so I don't think the one comment about EABI is accurate now, and people have been using EABI java builds succesfully. The test case given (gcj-dbtool -n) works for me. If anything, I suspect that maybe the ARM unwinder loops at end of stack bug is at play; for java at least there's a patch in Debian to work around that (I use it in my toolchain too).
Created attachment 20147 [details] test program to take a stack trace using _Unwind_ API I'm attaching a test program which sets up a predictable call chain, and then uses _Unwind_Backtrace() and _Unwind_GetRegionStart() to get the entry points of the functions in the call chain into an array. It prints the corresponding function names and verifies that the function entry points are correct. With gcc-4.4.3 and binutils-2.19.1 on armv5tel-unknown-linux-gnueabi this works fine: > gcc -funwind-tables -fno-unit-at-a-time -O -o stacktrace-2.19.1 stacktrace.c > ./stacktrace-2.19.1 bar f2 main With the same gcc binary but binutils-20100316 the program fails: > PATH=/tmp/binutils-2.20.51-install/bin:$PATH gcc -funwind-tables -fno-unit-at-a-time -O stacktrace.c -o stacktrace-2.20.51 > ./stacktrace-2.20.51 bar f1 f1 Abort The reason for the failure is that ld merged the .ARM.exidx entries for f2 and main with the one for f1, which breaks _Unwind_GetRegionStart(). First we see that f1, f2, and main are adjacent in memory: > nm -v stacktrace-2.20.51 ... 00008588 T tracefn 000085bc T bar 0000865c T f1 0000866c T f2 0000867c T main 00008698 T __libc_csu_fini ... With binutils-2.19.1 they each get their own .ARM.exidx entry: > /tmp/binutils-2.20.51-install/bin/readelf --unwind stacktrace-2.19.1 Unwind table index '.ARM.exidx' at offset 0x778 contains 7 entries: 0x845c <_start>: 0x1 [cantunwind] 0x8508 <printf_fn>: 0x80a8b0b0 Compact model 0 0xa8 pop {r4r14} 0xb0 finish 0xb0 finish 0x8588 <tracefn>: 0x80aab0b0 Compact model 0 0xaa pop {r4, r5, r6r14} 0xb0 finish 0xb0 finish 0x85bc <bar>: 0x8003a8b0 Compact model 0 0x03 vsp = vsp + 16 0xa8 pop {r4r14} 0xb0 finish 0x865c <f1>: 0x80a8b0b0 Compact model 0 0xa8 pop {r4r14} 0xb0 finish 0xb0 finish 0x866c <f2>: 0x80a8b0b0 Compact model 0 0xa8 pop {r4r14} 0xb0 finish 0xb0 finish 0x867c <main>: 0x80a8b0b0 Compact model 0 0xa8 pop {r4r14} 0xb0 finish 0xb0 finish But with binutils-20100316 f2 and main don't have their own entries, so PCs in them are reported as belonging to f1 instead: > /tmp/binutils-2.20.51-install/bin/readelf --unwind stacktrace-2.20.51 Unwind table index '.ARM.exidx' at offset 0x778 contains 6 entries: 0x845c <_start>: 0x1 [cantunwind] 0x8508 <printf_fn>: 0x80a8b0b0 Compact model 0 0xa8 pop {r4r14} 0xb0 finish 0xb0 finish 0x8588 <tracefn>: 0x80aab0b0 Compact model 0 0xaa pop {r4, r5, r6r14} 0xb0 finish 0xb0 finish 0x85bc <bar>: 0x8003a8b0 Compact model 0 0x03 vsp = vsp + 16 0xa8 pop {r4r14} 0xb0 finish 0x865c <f1>: 0x80a8b0b0 Compact model 0 0xa8 pop {r4r14} 0xb0 finish 0xb0 finish 0x8698 <__libc_csu_fini>: 0x1 [cantunwind] This test program also works on i686-linux, sparc64-linux -m32/-m64, and powerpc64-linux -m32. I understand the benefits of shrinking .ARM.exidx tables, but it does break parts of gcc's <unwind.h> implementation (and thus libjava), so should be opt-in via an explicit option and not done by default.
The ARM ABI permits merging of unwind entries, so this should never default to opt-in across the entire tool-chain. It might be that when GCC links java programs that it should pass an option to the linker to disable the optimization, but that's a different story. It may be that the java unwinder will need to just do more work to figure out the real call sequence. I'm not sufficiently familiar with the code to know the answer here.
I've looked at the amount of .ARM.exidx entry merging being done and its consequences for the various unwinders in gcc. Currently only table entries with immediate (inlined) data are merged, and for that all of gcc except for libjava seems to be Ok. However, gcc can still leak bogus unwind data via _Unwind_GetRegionStart, so I'm proposing a patch like the following: --- gcc-4.4.3/gcc/config/arm/unwind-arm.c.~1~ +++ gcc-4.4.3/gcc/config/arm/unwind-arm.c @@ -621,7 +621,6 @@ get_eit_entry (_Unwind_Control_Block *uc UCB_PR_ADDR (ucbp) = 0; return _URC_FAILURE; } - ucbp->pr_cache.fnstart = selfrel_offset31 (&eitp->fnoffset); /* Can this frame be unwound at all? */ if (eitp->content == EXIDX_CANTUNWIND) @@ -637,6 +636,15 @@ get_eit_entry (_Unwind_Control_Block *uc /* It is immediate data. */ ucbp->pr_cache.ehtp = (_Unwind_EHT_Header *)&eitp->content; ucbp->pr_cache.additional = 1; + /* Adjacent EIT entries with identical immediate data may be merged, + making fnoffset/fnstart inaccurate. The ARM unwinder doesn't need + fnstart for immediate EIT data. Other PRs than ARM's often use + fnstart to derive the locations of landing pads, but such PRs cannot + use immediate data in EIT entries, so are not affected by this issue. + However, code constructing stack traces may see stack frames for + functions with immediate data EIT entries. Clear fnstart to ensure + _Unwind_GetRegionStart doesn't return wrong data in this case. */ + ucbp->pr_cache.fnstart = 0; } else { @@ -645,6 +653,7 @@ get_eit_entry (_Unwind_Control_Block *uc ucbp->pr_cache.ehtp = (_Unwind_EHT_Header *) selfrel_offset31 (&eitp->content); ucbp->pr_cache.additional = 0; + ucbp->pr_cache.fnstart = selfrel_offset31 (&eitp->fnoffset); } /* Discover the personality routine address. */ This caused no regressions for c/c++/objc/obj-c++, but libjava got two more (ExtraClassLoader and InvokeInterface). The problem with libjava appears to be its stacktrace.cc module. It uses _Unwind_GetRegionStart to realign any interior PC to its function start PC, then it uses that to look up method and class in a hash table keyed by method start PC. With the .ARM.exidx merging, _Unwind_GetRegionStart can return the PC for a different method, possibly also in a different class, which totally breaks this. With my patch above libjava's stacktrace.cc can detect this case and switch to a linear search instead. I'll try to implement that soonish.
Wouldn't it be better to just remove _Unwind_GetRegionStart? This function is not part of the ARM EABI, and removing it would expose any (already broken) users at compile time.
(In reply to comment #29) > Wouldn't it be better to just remove _Unwind_GetRegionStart? > This function is not part of the ARM EABI, and removing it would expose any > (already broken) users at compile time. No. First it'd break most of gcc since the c, c++, and objc unwinders use it. And they generally use it to provide a base address when interpreting LSDA and computing landing pad addresses. Second, all _Unwind_GetRegionStart does is give r/o access to the fnstart value in ARM's UCB. But ARM's own unwinder uses fnstart in __gnu_unwind_pr_common, so if fnstart is broken then so it ARM's unwinder. ARM's unwinder is in the same boat as the c/c++/objc ones. It works because .ARM.exidx merging is limited to immediate table data, but the code using fnstart (by luck or design) only runs when the table contains non-immediate data, and in those cases fnstart is accurate.
(In reply to comment #30) > (In reply to comment #29) > > Wouldn't it be better to just remove _Unwind_GetRegionStart? > > This function is not part of the ARM EABI, and removing it would expose any > > (already broken) users at compile time. > > No. > > First it'd break most of gcc since the c, c++, and objc unwinders use it. And > they generally use it to provide a base address when interpreting LSDA and > computing landing pad addresses. > > Second, all _Unwind_GetRegionStart does is give r/o access to the fnstart value > in ARM's UCB. But ARM's own unwinder uses fnstart in __gnu_unwind_pr_common, > so if fnstart is broken then so it ARM's unwinder. > > ARM's unwinder is in the same boat as the c/c++/objc ones. It works because > .ARM.exidx merging is limited to immediate table data, but the code using > fnstart (by luck or design) only runs when the table contains non-immediate > data, and in those cases fnstart is accurate. > There appears to be a mistaken presumption running through this thread that there is a 1<->1 mapping between unwind blocks and source language functions. This is not the case, and any code written with such a presumption is just wrong code. Just because such behaviour may work on other machines does not make the presumption correct or the ARM unwinding code wrong. 1) Compilers may inline functions. If they do so, then unwind blocks will get merged. 2) Compilers may create extra frames (though currently rarely). If they do, then a function may have more than one frame. 3) They may do some combination of the above. 4) The ARM frame-unwinding annotations are designed to unwind C++ exceptions. If they don't work outside that specification that does not make them wrong; they simply weren't designed for the other (mis-)uses to which some people are trying to put them. Source language code is run on an abstract machine. When it's mapped onto real hardware the reality can be very different. You can't rely on the two corresponding directly beyond the semantics of the language.
(In reply to comment #31) > There appears to be a mistaken presumption running through this thread that > there is a 1<->1 mapping between unwind blocks and source language functions. > This is not the case, and any code written with such a presumption is just > wrong code. Just to clarify: I'm not defending the parts of gcc that depend on the fact that gcc's ARM EABI backend has implemented _Unwind_GetRegionStart. I'm just trying to explain that those dependencies currently do exist. > 4) The ARM frame-unwinding annotations are designed to unwind C++ exceptions. > If they don't work outside that specification that does not make them wrong; > they simply weren't designed for the other (mis-)uses to which some people are > trying to put them. The extreme logical conclusion is that gcc's ARM backend shouldn't even pretend to implement _Unwind_GetRegionStart, as Paul suggested. Doing that in an arm-linux-gnueabi cross-compiler causes compile-time breakage for most languages including C++. A less extreme conclusion is to make _Unwind_GetRegionStart always return 0. That solves the compile-time dependencies but not the functional ones (however technically invalid they are). Ignoring libjava's stacktrace module the functional dependencies are all the same: LSDA blobs attached to exception table entries are in DWARF format and contain deltas for landing pads relative to the enclosing function entry. GCC assumes that _Unwind_GetRegionStart works, so always outputs a dummy @LPStart in the LSDA (output_function_exception_table in gcc/except.c); at runtime the PRs' LSDA parsers check if @LPStart is encoded as a dummy or a proper value, if it's a dummy they set it to the return value of _Unwind_GetRegionStart. Even the libstdc++ unwinder does this and thus depends on _Unwind_GetRegionStart on ARM EABI. So to eliminate the functional dependency on _Unwind_GetRegionStart we have to instruct output_function_exception_table to output a proper @LPStart with each LSDA blob not a dummy one. I'll look into that next.
Created attachment 20333 [details] eliminate use of _Unwind_GetRegionStart on ARM, part 1 Here's a preliminary patch to eliminate the functional dependency on _Unwind_GetRegionStart on ARM. There are two key changes: output_function_exception_table outputs a proper @LPStart pointer in each LSDA blob, and every PR's parse_lsda_header sets @Start to @LPstart if @Start is NULL. Without these updates making _Unwind_GetRegionStart return NULL on ARM causes massive regressions in the c++/objc/obj-c++ test suites, and some in the c and libffi test suites. With these updates those regressions are eliminated. There are however two new regressions: 1. There are several orders of magnitude more alignment exceptions during the test suite run, probably in the read_encoded_value calls that now trigger. 2. For initial testing I ran this on i686 with its _Unwind_GetRegionStart made to return NULL too, and while it worked the g++.old-deja/g++.law/weak.C test case failed with a linkage error. It looks like gcc emitted exception table data for a discarded piece of code. On my ARM box the test suite considered that test case UNSUPPORTED so never ran it. The first regression can probably be fixed by avoiding to define unneeded @LPStart pointers and making read_encoded_value mis-alignment safe. The second regression might be fixable by making the @LPStart symbol references weak. More fixes will be needed for libjava's stacktrace mess though.
Created attachment 20371 [details] eliminate use of _Unwind_GetRegionStart on ARM, take 2 The previous patch to eliminate _Unwind_GetRegionStart on ARM suffered from major misalignment exceptions. The function pointers in the @LPStart fields weren't resolved by the linker but apparently required runtime fixup by glibc. Since they are always misaligned (the LSDA blob starts aligned with a format byte followed by the @LPStart pointer), each @LPStart value resulted in two alignment exceptions. Ouch. The new version of the patch makes the @LPStart pointers PC-relative which eliminates the runtime fixups and thus the alignment exceptions.
I've been trying this on gcc trunk, and it doesn't have the problem. It seems that merging is not done. I get ... 0x8684 <f2>: @0x8808 Compact model 1 0xb1 0x08 pop {r3} 0x84 0x00 pop {r14} 0xb0 finish 0xb0 finish ... 0x86b0 <f1>: @0x8820 Compact model 1 0xb1 0x08 pop {r3} 0x84 0x00 pop {r14} 0xb0 finish 0xb0 finish Is it maybe the case that "Compact model 1" unwinder data isn't yet being merged?
(In reply to comment #35) > I've been trying this on gcc trunk, and it doesn't have the problem. > It seems that merging is not done. > > I get > > ... > 0x8684 <f2>: @0x8808 > Compact model 1 > 0xb1 0x08 pop {r3} > 0x84 0x00 pop {r14} > 0xb0 finish > 0xb0 finish > ... > 0x86b0 <f1>: @0x8820 > Compact model 1 > 0xb1 0x08 pop {r3} > 0x84 0x00 pop {r14} > 0xb0 finish > 0xb0 finish > > Is it maybe the case that "Compact model 1" unwinder data isn't yet being > merged? They have to be adjacent to be merged. Looks like there's something else between f2 and f1 in your object code. (You are using binutils >= 2.20 right?)
Subject: Re: [4.4/4.5/4.6 regression] regressions in libjava testsuite on arm-linux On 04/13/2010 05:52 PM, mikpe at it dot uu dot se wrote: >> Is it maybe the case that "Compact model 1" unwinder data isn't yet being >> merged? > > They have to be adjacent to be merged. Looks like there's something else > between f2 and f1 in your object code. (You are using binutils >= 2.20 right?) Yes. OK, that explains it: gcc trunk outputs the functions in a completely different order. Andrew.
I think I can fairly easily add an option to the linker to suppress table merging when linking Java libraries, and that will completely solve the problem, at least from my point of view. To that end, it would not be at all helpful to make _Unwind_GetRegionStart on ARM return NULL.
Subject: Bug 40860 Author: aph Date: Wed Apr 21 16:34:01 2010 New Revision: 158611 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=158611 Log: 2010-04-19 Andrew Haley <aph@redhat.com> PR libgcj/40860 * configure.ac: Handle --no-merge-exidx-entries. Modified: trunk/libjava/ChangeLog trunk/libjava/configure trunk/libjava/configure.ac
Subject: Bug 40860 Author: aph Date: Wed Apr 21 17:04:42 2010 New Revision: 158614 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=158614 Log: 2010-04-19 Andrew Haley <aph@redhat.com> PR libgcj/40860 * configure.ac: Handle --no-merge-exidx-entries. Modified: branches/gcc-4_4-branch/libjava/ChangeLog branches/gcc-4_4-branch/libjava/configure branches/gcc-4_4-branch/libjava/configure.ac
Subject: Bug 40860 Author: aph Date: Thu Apr 22 16:06:39 2010 New Revision: 158648 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=158648 Log: 2010-04-19 Andrew Haley <aph@redhat.com> PR libgcj/40860 * configure.ac: Handle --no-merge-exidx-entries. Modified: branches/gcc-4_5-branch/libjava/ChangeLog branches/gcc-4_5-branch/libjava/configure branches/gcc-4_5-branch/libjava/configure.ac
Fixed.