Bug 40860 - [4.4/4.5/4.6 regression] regressions in libjava testsuite on arm-linux
Summary: [4.4/4.5/4.6 regression] regressions in libjava testsuite on arm-linux
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: libgcj (show other bugs)
Version: 4.4.0
: P4 normal
Target Milestone: 4.4.4
Assignee: Andrew Haley
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-07-26 10:51 UTC by Debian GCC Maintainers
Modified: 2010-04-22 16:08 UTC (History)
8 users (show)

See Also:
Host:
Target: arm-linux-gnueabi
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
test program to take a stack trace using _Unwind_ API (796 bytes, text/plain)
2010-03-20 22:36 UTC, Mikael Pettersson
Details
eliminate use of _Unwind_GetRegionStart on ARM, part 1 (1.13 KB, patch)
2010-04-08 12:14 UTC, Mikael Pettersson
Details | Diff
eliminate use of _Unwind_GetRegionStart on ARM, take 2 (1.23 KB, patch)
2010-04-12 19:03 UTC, Mikael Pettersson
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Debian GCC Maintainers 2009-07-26 10:51:31 UTC
seen with 4.4.1 and trunk, 4.4.1 cannot be used anymore to bootstrap OpenJDK:

  Matthias

Executing on host: /home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/libjava/testsuite/../libtool --silent --tag=GCJ --mode=link /home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc/gcj -B/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc/ --encoding=UTF-8 -B/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/libjava/testsuite/../ /home/doko/gcc/4.4/gcj-4.4-4.4.1/src/libjava/testsuite/libjava.lang/StackTrace2.jar  -w  -specs=libgcj-test.spec -no-install --main=StackTrace2 -g  -L/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/./libjava/.libs -lm   -o /home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/libjava/testsuite/StackTrace2.exe    (timeout = 600)
PASS: StackTrace2 compilation from source
set_ld_library_path_env_vars: ld_library_path=.:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/./libjava/.libs:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc
invoke: /home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/libjava/testsuite/StackTrace2.exe  
Setting LD_LIBRARY_PATH to .:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/./libjava/.libs:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc:.:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/./libjava/.libs:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc
Trace length = 9
FAIL - expected StackTrace2$Inner, got: java.lang.Throwable.FAIL - expected doCrash, got: <init>:FAIL - expected 33, got: 161, in file Throwable.java
FAIL - expected StackTrace2$Inner, got: java.lang.Throwable.FAIL - expected foo, got: <init>:FAIL - expected 28, got: 149, in file Throwable.java
FAIL - expected StackTrace2, got: java.lang.Throwable.FAIL - expected a, got: <init>:FAIL - expected 21, got: 66, in file Exception.java
FAIL - expected StackTrace2, got: java.lang.Throwable.FAIL - expected main, got: <init>:FAIL - expected 10, got: 64, in file RuntimeException.java
PASS: StackTrace2 execution - source compiled test
FAIL: StackTrace2 output - source compiled test

Executing on host: /home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/libjava/testsuite/../libtool --silent --tag=GCJ --mode=link /home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc/gcj -B/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc/ --encoding=UTF-8 -B/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/libjava/testsuite/../ /home/doko/gcc/4.4/gcj-4.4-4.4.1/src/libjava/testsuite/libjava.lang/Throw_2.jar  -w  -specs=libgcj-test.spec -no-install --main=Throw_2 -g  -L/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/./libjava/.libs -lm   -o /home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/libjava/testsuite/Throw_2.exe    (timeout = 600)
PASS: Throw_2 compilation from source
set_ld_library_path_env_vars: ld_library_path=.:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/./libjava/.libs:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc
invoke: /home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/libjava/testsuite/Throw_2.exe  
Setting LD_LIBRARY_PATH to .:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/./libjava/.libs:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc:.:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/./libjava/.libs:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc
1
FAIL: Throw_2 execution - source compiled test
UNTESTED: Throw_2 output - source compiled test

Executing on host: /home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/libjava/testsuite/../libtool --silent --tag=GCJ --mode=link /home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc/gcj -B/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc/ --encoding=UTF-8 -B/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/libjava/testsuite/../ /home/doko/gcc/4.4/gcj-4.4-4.4.1/src/libjava/testsuite/libjava.lang/Throw_3.jar  -w  -specs=libgcj-test.spec -no-install --main=Throw_3 -g  -L/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/./libjava/.libs -lm   -o /home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/libjava/testsuite/Throw_3.exe    (timeout = 600)
PASS: Throw_3 compilation from source
set_ld_library_path_env_vars: ld_library_path=.:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/./libjava/.libs:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc
invoke: /home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/libjava/testsuite/Throw_3.exe  
Setting LD_LIBRARY_PATH to .:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/./libjava/.libs:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc:.:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/./libjava/.libs:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc
bad
PASS: Throw_3 execution - source compiled test
FAIL: Throw_3 output - source compiled test
Executing on host: /home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/libjava/testsuite/../libtool --silent --tag=GCJ --mode=link /home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc/gcj -B/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc/ --encoding=UTF-8 -B/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/libjava/testsuite/../ /home/doko/gcc/4.4/gcj-4.4-4.4.1/src/libjava/testsuite/libjava.lang/stacktrace.jar  -w  -specs=libgcj-test.spec -no-install --main=stacktrace -O3 -findirect-dispatch -g  -L/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/./libjava/.libs -lm   -o /home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/libjava/testsuite/stacktrace.exe    (timeout = 600)
PASS: stacktrace -O3 -findirect-dispatch compilation from source
set_ld_library_path_env_vars: ld_library_path=.:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/./libjava/.libs:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc
invoke: /home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/libjava/testsuite/stacktrace.exe  
Setting LD_LIBRARY_PATH to .:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/./libjava/.libs:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc:.:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/arm-linux-gnueabi/./libjava/.libs:/home/doko/gcc/4.4/gcj-4.4-4.4.1/build/gcc
stacktrace.c
stacktrace.b
stacktrace.a
stacktrace.main
PASS: stacktrace -O3 -findirect-dispatch execution - source compiled test
FAIL: stacktrace -O3 -findirect-dispatch output - source compiled test
Comment 1 Mikael Pettersson 2010-01-24 16:04:02 UTC
I'm now seeing failures in StackTrace2 and Throw_3 on arm-linux-gnueabi with gcc-4.3 branch which I didn't use to see. gcc-4.4 branch doesn't fail for me on these two, but both 4.4 and 4.3 fail (as always) on Throw_2.
Comment 2 Mikael Pettersson 2010-01-25 09:33:09 UTC
I had very recently updated binutils from 2.19.1 to 2.20 + branch updates + some fixes from trunk. Going back to binutils 2.19.1 while changing nothing else eliminated the StackTrace2 and Throw_3 failures from gcc-4.3.

My last gcc-4.4 build and test suite run was probably still with binutils-2.19.1, which might explain why I didn't see those errors there.
Comment 3 Mikael Pettersson 2010-02-03 14:51:47 UTC
gcc-4.4.3 with binutils-2.20 fails StackTrace2 and Throw_3, reverting to binutils-2.19.1 eliminated those two failures.

Bootstrapping with binutils-2.19.1 but running the test suite with binutils-2.20 did not trigger the failures. Rebuilding just libjava with binutils-2.20 also did not trigger the failures.
Comment 4 Mikael Pettersson 2010-02-05 15:40:36 UTC
Further experiments show that the failures are triggered by using the binutils-2.20 version of ld when bootstrapping.
Comment 5 Mikael Pettersson 2010-02-06 15:36:08 UTC
Same failures when bootstrapped with binutils-2.20.51.20100202. So there's no fix on binutils trunk to be identified and backported. I'll try to identify the point where the regression was introduced instead.
Comment 6 Mikael Pettersson 2010-02-13 20:49:21 UTC
A bisection through the binutils weekly and daily snapshots has identified 090505 as the last good snapshot and 090506 as the first bad one.
Comment 7 Matthias Klose 2010-02-15 15:32:01 UTC
this seems to be the patch for the ARM unwind table linker processing? if yes, see http://sourceware.org/bugzilla/show_bug.cgi?id=10695 for some comments.
Comment 8 Mikael Pettersson 2010-02-15 22:26:58 UTC
I've identified <http://sourceware.org/ml/binutils-cvs/2009-05/msg00021.html> as the source of this regression.
Comment 9 Matthias Klose 2010-02-16 16:34:23 UTC
confirmed with binutils trunk 20100216, checking with gold from the trunk.
Comment 10 Mikael Pettersson 2010-02-19 23:32:24 UTC
The binutils patch that broke libjava came from CodeSourcery, so I decided to test their latest G++ lite for arm-linux-gnueabi (2009q3-67). That compiler is based on gcc-4.4.1, and it does not have the libjava regressions with current binutils that we see with FSF gcc.

Looking at the source diffs and changelogs, I've identified four distinct unwind-related changes in their gcc, two of which are ARM-specific. I'll test them on FSF 4.4 next.
Comment 11 Mikael Pettersson 2010-02-22 21:49:24 UTC
A recent 4.4 with the four unwind-related fixes I identified in CodeSourcery G++ lite 2009q3-67 applied still regresses in libjava as before with recent binutils.
Comment 12 Andrew Haley 2010-02-28 10:07:19 UTC
I can't duplicate this problem with gcc trunk and binutils 2.20-0ubuntu2.
Comment 13 Mikael Pettersson 2010-03-04 10:16:59 UTC
With binutils-2.20.51.20100223, gcc-4.5-20090327 regresses libjava as described here but gcc-4.5-20100225 does not.

Starting bisection on trunk (with 24+ hour build/test cycles, that's going to take a while).
Comment 14 Mikael Pettersson 2010-03-15 09:09:16 UTC
The bug was fixed for 4.5 by r148072:

2009-06-02  Richard Earnshaw  <rearnsha@arm.com>

       * arm.c (arm_get_frame_offsets): Prefer using r3 for padding a
       push/pop multiple to 8-byte alignment.

That change applies cleanly to 4.4 and fixes the bug there as well, although I've only done a smallish non-bootstrap build and libjava-only test with it. I'll do a full bootstrap and regression test next.

This change is included in CodeSourcery's 4.4-based G++ lite, which explains why their compiler works with the newer binutils.

Looking at the patch submission
http://gcc.gnu.org/ml/gcc-patches/2009-06/msg00116.html
I see no indication that this should have any effect on exception handling or stack unwinding, but obviously it does. I'm adding Richard E. to the cc: list.
Comment 15 Richard Earnshaw 2010-03-15 09:16:01 UTC
(In reply to comment #14)
> The bug was fixed for 4.5 by r148072:
> 
> 2009-06-02  Richard Earnshaw  <rearnsha@arm.com>
> 
>        * arm.c (arm_get_frame_offsets): Prefer using r3 for padding a
>        push/pop multiple to 8-byte alignment.
>

I doubt that's really fixed the bug.  More likely it's caused it to go latent by changing the unwind information in some subtle way.
Comment 16 Matthias Klose 2010-03-16 13:42:22 UTC
this change doesn't make any change to the libjava test results (configured --with-mode=thumb).
Comment 17 Mikael Pettersson 2010-03-16 17:29:21 UTC
I decided to attack things from the binutils side. The problematic binutils change rolls two distinct changes into one (add cantunwind table entries, merge adjacent equivalent table entries). Disabling the "merge" part eliminated the libjava regressions in my ARM mode build:

--- binutils-2.20.51/bfd/elf32-arm.c.~1~
+++ binutils-2.20.51/bfd/elf32-arm.c
@@ -9301,10 +9306,12 @@ elf32_arm_fix_exidx_coverage (asection *
 
          if (elide)
            {
+#if 0
              add_unwind_table_edit (&unwind_edit_head, &unwind_edit_tail,
                                     DELETE_EXIDX_ENTRY, NULL, j / 8);
 
              deleted_exidx_bytes += 8;
+#endif
            }
 
          last_unwind_type = unwind_type;

I'll keep experimenting.
Comment 18 Matthias Klose 2010-03-16 23:30:21 UTC
rebuilding binutils 2.20.1 with the patch from comment #17 shows one regression:

Running /home/doko/binutils/binutils-2.20.1/ld/testsuite/ld-arm/arm-elf.exp ...
FAIL: ld-arm/unwind-4
Comment 19 Mikael Pettersson 2010-03-16 23:39:02 UTC
(In reply to comment #18)
> rebuilding binutils 2.20.1 with the patch from comment #17 shows one
> regression:
> 
> Running /home/doko/binutils/binutils-2.20.1/ld/testsuite/ld-arm/arm-elf.exp ...
> FAIL: ld-arm/unwind-4

I guess I should've run the binutils testsuite first. (Looking..) Ah, but that failure is Ok, as it's checking for the merging that the patch disabled.
Comment 20 Matthias Klose 2010-03-17 10:51:17 UTC
no change in the libjava testsuite when built with these binutils
Comment 21 Mikael Pettersson 2010-03-17 21:13:31 UTC
(In reply to comment #20)
> no change in the libjava testsuite when built with these binutils

But that's still thumb not arm like in comment #16? All my results are from plain arm (armv5tel) builds.
Comment 22 Mikael Pettersson 2010-03-17 21:23:43 UTC
I did another binutils experiment. I reverted my patch to disable general merging of table entries, and instead disabled generating new and merging cantunwind entries. With that binutils libjava regressed just like with vanilla post-2.19.1 binutils. The general merging bit seems to be the problem.

--- binutils-2.20.51/bfd/elf32-arm.c.~1~
+++ binutils-2.20.51/bfd/elf32-arm.c
@@ -9148,6 +9148,7 @@ adjust_exidx_size(asection *exidx_sec, i
   bfd_set_section_size (out_sec->owner, out_sec, out_sec->size +adjust);
 }
 
+#if 0
 /* Insert an EXIDX_CANTUNWIND marker at the end of a section.  */
 static void
 insert_cantunwind_after(asection *text_sec, asection *exidx_sec)
@@ -9162,6 +9163,7 @@ insert_cantunwind_after(asection *text_s
 
   adjust_exidx_size(exidx_sec, 8);
 }
+#endif
 
 /* Scan .ARM.exidx tables, and create a list describing edits which should be
    made to those tables, such that:
@@ -9248,7 +9250,9 @@ elf32_arm_fix_exidx_coverage (asection *
          if (sec->size == 0)
            continue;
 
+#if 0
          insert_cantunwind_after(last_text_sec, last_exidx_sec);
+#endif
          last_unwind_type = 0;
          continue;
        }
@@ -9282,8 +9286,10 @@ elf32_arm_fix_exidx_coverage (asection *
          /* An EXIDX_CANTUNWIND entry.  */
          if (second_word == 1)
            {
+#if 0
              if (last_unwind_type == 0)
                elide = 1;
+#endif
              unwind_type = 0;
            }
          /* Inlined unwinding data.  Merge if equal to previous.  */
@@ -9326,8 +9332,10 @@ elf32_arm_fix_exidx_coverage (asection *
     }
 
   /* Add terminating CANTUNWIND entry.  */
+#if 0
   if (last_exidx_sec && last_unwind_type != 0)
     insert_cantunwind_after(last_text_sec, last_exidx_sec);
+#endif
 
   return TRUE;
 }
Comment 23 Mikael Pettersson 2010-03-19 23:20:35 UTC
While working on another debugging patch I noticed something that I think might explain the libjava regressions, especially the stack trace ones.

The binutils change breaks _Unwind_GetRegionStart().

Example: We have three functions, f1() starts at address 100 and is 20 bytes, f2() starts at address 120 and is 10 bytes, and f3() starts at address 130. f1() and f2() got the same inlined unwind table data U1, while f3() got some different unwind table data U3.

A pre-binutils-2.20 .ARM.exidx table for this program would be

[0] = { 100, U1 }, [1] = { 120, U1 }, [2] = { 130, U3 }, ...

binutils-2.20 and later merge entry [1] with entry [0] since they are adjacent and have the same unwind table data, producing a final .ARM.exidx table:

[0] = { 100, U1 }, [1] = { 130, U3 }, ...

So an exception in or unwind through f2() will now return { 100, U1 } rather than { 120, U1 }.

The first thing that happens with the found table entry is that the function start address is stashed away (unwind-arm.c:get_eit_entry(), the assignment to ucbp->pr_cache.fnstart). Then __gnu_unwind_pr_common() uses fnstart while interpreting the unwind table data; I haven't studied that code but since libstdc++ test cases don't regress I assume the incorrect fnstart doesn't matter for inlined unwind table data.

However, pr-support.c:_Unwind_GetRegionStart() also uses fnstart, so users of _Unwind_GetRegionStart() will see f1() not f2() for exceptions in or unwinds through f2(). In particular, libjava/stacktrace.cc looks like it will construct
bogus stack traces (see its uses of 'start_ip').
Comment 24 Ramana Radhakrishnan 2010-03-20 18:54:45 UTC
This rings a bell and the comment in 
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29206#c8
is probably related. 

If so, this is essentially a dup of PR29206 and I'm not sure what more we can do with this.

cheers
Ramana
Comment 25 Mikael Pettersson 2010-03-20 22:10:11 UTC
(In reply to comment #24)
> This rings a bell and the comment in 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29206#c8
> is probably related. 

I don't think this is at all the same as PR29206. PR29206 is very old now, so I don't think the one comment about EABI is accurate now, and people have been using EABI java builds succesfully. The test case given (gcj-dbtool -n) works for me. If anything, I suspect that maybe the ARM unwinder loops at end of stack bug is at play; for java at least there's a patch in Debian to work around that (I use it in my toolchain too).
Comment 26 Mikael Pettersson 2010-03-20 22:36:20 UTC
Created attachment 20147 [details]
test program to take a stack trace using _Unwind_ API

I'm attaching a test program which sets up a predictable call chain, and then uses _Unwind_Backtrace() and _Unwind_GetRegionStart() to get the entry points of the functions in the call chain into an array. It prints the corresponding function names and verifies that the function entry points are correct.

With gcc-4.4.3 and binutils-2.19.1 on armv5tel-unknown-linux-gnueabi this works fine:

> gcc -funwind-tables -fno-unit-at-a-time -O -o stacktrace-2.19.1 stacktrace.c
> ./stacktrace-2.19.1 
bar
f2
main

With the same gcc binary but binutils-20100316 the program fails:

> PATH=/tmp/binutils-2.20.51-install/bin:$PATH gcc -funwind-tables -fno-unit-at-a-time -O stacktrace.c -o stacktrace-2.20.51
> ./stacktrace-2.20.51 
bar
f1
f1
Abort

The reason for the failure is that ld merged the .ARM.exidx entries for f2 and main with the one for f1, which breaks _Unwind_GetRegionStart(). First we see that f1, f2, and main are adjacent in memory:

> nm -v stacktrace-2.20.51
...
00008588 T tracefn
000085bc T bar
0000865c T f1
0000866c T f2
0000867c T main
00008698 T __libc_csu_fini
...

With binutils-2.19.1 they each get their own .ARM.exidx entry:

> /tmp/binutils-2.20.51-install/bin/readelf --unwind stacktrace-2.19.1 

Unwind table index '.ARM.exidx' at offset 0x778 contains 7 entries:

0x845c <_start>: 0x1 [cantunwind]

0x8508 <printf_fn>: 0x80a8b0b0
  Compact model 0
  0xa8      pop {r4r14}
  0xb0      finish
  0xb0      finish

0x8588 <tracefn>: 0x80aab0b0
  Compact model 0
  0xaa      pop {r4, r5, r6r14}
  0xb0      finish
  0xb0      finish

0x85bc <bar>: 0x8003a8b0
  Compact model 0
  0x03      vsp = vsp + 16
  0xa8      pop {r4r14}
  0xb0      finish

0x865c <f1>: 0x80a8b0b0
  Compact model 0
  0xa8      pop {r4r14}
  0xb0      finish
  0xb0      finish

0x866c <f2>: 0x80a8b0b0
  Compact model 0
  0xa8      pop {r4r14}
  0xb0      finish
  0xb0      finish

0x867c <main>: 0x80a8b0b0
  Compact model 0
  0xa8      pop {r4r14}
  0xb0      finish
  0xb0      finish

But with binutils-20100316 f2 and main don't have their own entries, so PCs in them are reported as belonging to f1 instead:

> /tmp/binutils-2.20.51-install/bin/readelf --unwind stacktrace-2.20.51 

Unwind table index '.ARM.exidx' at offset 0x778 contains 6 entries:

0x845c <_start>: 0x1 [cantunwind]

0x8508 <printf_fn>: 0x80a8b0b0
  Compact model 0
  0xa8      pop {r4r14}
  0xb0      finish
  0xb0      finish

0x8588 <tracefn>: 0x80aab0b0
  Compact model 0
  0xaa      pop {r4, r5, r6r14}
  0xb0      finish
  0xb0      finish

0x85bc <bar>: 0x8003a8b0
  Compact model 0
  0x03      vsp = vsp + 16
  0xa8      pop {r4r14}
  0xb0      finish

0x865c <f1>: 0x80a8b0b0
  Compact model 0
  0xa8      pop {r4r14}
  0xb0      finish
  0xb0      finish

0x8698 <__libc_csu_fini>: 0x1 [cantunwind]

This test program also works on i686-linux, sparc64-linux -m32/-m64, and powerpc64-linux -m32.

I understand the benefits of shrinking .ARM.exidx tables, but it does break parts of gcc's <unwind.h> implementation (and thus libjava), so should be opt-in via an explicit option and not done by default.
Comment 27 Richard Earnshaw 2010-03-22 23:48:23 UTC
The ARM ABI permits merging of unwind entries, so this should never default to opt-in across the entire tool-chain.   It might be that when GCC links java programs that it should pass an option to the linker to disable the optimization, but that's a different story.

It may be that the java unwinder will need to just do more work to figure out the real call sequence.  I'm not sufficiently familiar with the code to know the answer here.
Comment 28 Mikael Pettersson 2010-03-30 13:21:09 UTC
I've looked at the amount of .ARM.exidx entry merging being done and its consequences for the various unwinders in gcc.  Currently only table entries with immediate (inlined) data are merged, and for that all of gcc except for libjava seems to be Ok.  However, gcc can still leak bogus unwind data via
_Unwind_GetRegionStart, so I'm proposing a patch like the following:

--- gcc-4.4.3/gcc/config/arm/unwind-arm.c.~1~
+++ gcc-4.4.3/gcc/config/arm/unwind-arm.c
@@ -621,7 +621,6 @@ get_eit_entry (_Unwind_Control_Block *uc
       UCB_PR_ADDR (ucbp) = 0;
       return _URC_FAILURE;
     }
-  ucbp->pr_cache.fnstart = selfrel_offset31 (&eitp->fnoffset);
 
   /* Can this frame be unwound at all?  */
   if (eitp->content == EXIDX_CANTUNWIND)
@@ -637,6 +636,15 @@ get_eit_entry (_Unwind_Control_Block *uc
       /* It is immediate data.  */
       ucbp->pr_cache.ehtp = (_Unwind_EHT_Header *)&eitp->content;
       ucbp->pr_cache.additional = 1;
+      /* Adjacent EIT entries with identical immediate data may be merged,
+        making fnoffset/fnstart inaccurate.  The ARM unwinder doesn't need
+        fnstart for immediate EIT data.  Other PRs than ARM's often use
+        fnstart to derive the locations of landing pads, but such PRs cannot
+        use immediate data in EIT entries, so are not affected by this issue.
+        However, code constructing stack traces may see stack frames for
+        functions with immediate data EIT entries.  Clear fnstart to ensure
+        _Unwind_GetRegionStart doesn't return wrong data in this case.  */
+      ucbp->pr_cache.fnstart = 0;
     }
   else
     {
@@ -645,6 +653,7 @@ get_eit_entry (_Unwind_Control_Block *uc
       ucbp->pr_cache.ehtp =
        (_Unwind_EHT_Header *) selfrel_offset31 (&eitp->content);
       ucbp->pr_cache.additional = 0;
+      ucbp->pr_cache.fnstart = selfrel_offset31 (&eitp->fnoffset);
     }
 
   /* Discover the personality routine address.  */

This caused no regressions for c/c++/objc/obj-c++, but libjava got two more (ExtraClassLoader and InvokeInterface).

The problem with libjava appears to be its stacktrace.cc module. It uses _Unwind_GetRegionStart to realign any interior PC to its function start PC, then it uses that to look up method and class in a hash table keyed by method start PC.  With the .ARM.exidx merging, _Unwind_GetRegionStart can return the PC for a different method, possibly also in a different class, which totally breaks this.  With my patch above libjava's stacktrace.cc can detect this case and switch to a linear search instead.  I'll try to implement that soonish.
Comment 29 Paul Brook 2010-03-30 14:04:24 UTC
Wouldn't it be better to just remove _Unwind_GetRegionStart?
This function is not part of the ARM EABI, and removing it would expose any (already broken) users at compile time.
Comment 30 Mikael Pettersson 2010-03-30 15:09:23 UTC
(In reply to comment #29)
> Wouldn't it be better to just remove _Unwind_GetRegionStart?
> This function is not part of the ARM EABI, and removing it would expose any
> (already broken) users at compile time.

No.

First it'd break most of gcc since the c, c++, and objc unwinders use it. And they generally use it to provide a base address when interpreting LSDA and computing landing pad addresses.

Second, all _Unwind_GetRegionStart does is give r/o access to the fnstart value in ARM's UCB.  But ARM's own unwinder uses fnstart in __gnu_unwind_pr_common, so if fnstart is broken then so it ARM's unwinder.

ARM's unwinder is in the same boat as the c/c++/objc ones.  It works because .ARM.exidx merging is limited to immediate table data, but the code using fnstart (by luck or design) only runs when the table contains non-immediate data, and in those cases fnstart is accurate.
Comment 31 Richard Earnshaw 2010-03-31 08:47:21 UTC
(In reply to comment #30)
> (In reply to comment #29)
> > Wouldn't it be better to just remove _Unwind_GetRegionStart?
> > This function is not part of the ARM EABI, and removing it would expose any
> > (already broken) users at compile time.
> 
> No.
> 
> First it'd break most of gcc since the c, c++, and objc unwinders use it. And
> they generally use it to provide a base address when interpreting LSDA and
> computing landing pad addresses.
> 
> Second, all _Unwind_GetRegionStart does is give r/o access to the fnstart value
> in ARM's UCB.  But ARM's own unwinder uses fnstart in __gnu_unwind_pr_common,
> so if fnstart is broken then so it ARM's unwinder.
> 
> ARM's unwinder is in the same boat as the c/c++/objc ones.  It works because
> .ARM.exidx merging is limited to immediate table data, but the code using
> fnstart (by luck or design) only runs when the table contains non-immediate
> data, and in those cases fnstart is accurate.
> 

There appears to be a mistaken presumption running through this thread that there is a 1<->1 mapping between unwind blocks and source language functions.  This is not the case, and any code written with such a presumption is just wrong code.

Just because such behaviour may work on other machines does not make the presumption correct or the ARM unwinding code wrong.

1) Compilers may inline functions.  If they do so, then unwind blocks will get merged.

2) Compilers may create extra frames (though currently rarely).  If they do, then a function may have more than one frame.

3) They may do some combination of the above.

4) The ARM frame-unwinding annotations are designed to unwind C++ exceptions.  If they don't work outside that specification that does not make them wrong; they simply weren't designed for the other (mis-)uses to which some people are trying to put them.

Source language code is run on an abstract machine.  When it's mapped onto real hardware the reality can be very different.  You can't rely on the two corresponding directly beyond the semantics of the language.
Comment 32 Mikael Pettersson 2010-03-31 21:43:58 UTC
(In reply to comment #31)
> There appears to be a mistaken presumption running through this thread that
> there is a 1<->1 mapping between unwind blocks and source language functions. 
> This is not the case, and any code written with such a presumption is just
> wrong code.

Just to clarify: I'm not defending the parts of gcc that depend on the fact that gcc's ARM EABI backend has implemented _Unwind_GetRegionStart.  I'm just trying to explain that those dependencies currently do exist.

> 4) The ARM frame-unwinding annotations are designed to unwind C++ exceptions. 
> If they don't work outside that specification that does not make them wrong;
> they simply weren't designed for the other (mis-)uses to which some people are
> trying to put them.

The extreme logical conclusion is that gcc's ARM backend shouldn't even pretend to implement _Unwind_GetRegionStart, as Paul suggested.  Doing that in an arm-linux-gnueabi cross-compiler causes compile-time breakage for most languages including C++.

A less extreme conclusion is to make _Unwind_GetRegionStart always return 0.  That solves the compile-time dependencies but not the functional ones (however technically invalid they are).  Ignoring libjava's stacktrace module the functional dependencies are all the same: LSDA blobs attached to exception table entries are in DWARF format and contain deltas for landing pads relative to the enclosing function entry.  GCC assumes that _Unwind_GetRegionStart works, so always outputs a dummy @LPStart in the LSDA (output_function_exception_table in gcc/except.c); at runtime the PRs' LSDA parsers check if @LPStart is encoded as a dummy or a proper value, if it's a dummy they set it to the return value of _Unwind_GetRegionStart.  Even the libstdc++ unwinder does this and thus depends on _Unwind_GetRegionStart on ARM EABI.

So to eliminate the functional dependency on _Unwind_GetRegionStart we have to instruct output_function_exception_table to output a proper @LPStart with each LSDA blob not a dummy one.  I'll look into that next.
Comment 33 Mikael Pettersson 2010-04-08 12:14:10 UTC
Created attachment 20333 [details]
eliminate use of _Unwind_GetRegionStart on ARM, part 1

Here's a preliminary patch to eliminate the functional dependency on _Unwind_GetRegionStart on ARM.  There are two key changes: output_function_exception_table outputs a proper @LPStart pointer in each LSDA blob, and every PR's parse_lsda_header sets @Start to @LPstart if @Start is NULL.

Without these updates making _Unwind_GetRegionStart return NULL on ARM causes massive regressions in the c++/objc/obj-c++ test suites, and some in the c and libffi test suites.  With these updates those regressions are eliminated.

There are however two new regressions:

1. There are several orders of magnitude more alignment exceptions during the test suite run, probably in the read_encoded_value calls that now trigger.

2. For initial testing I ran this on i686 with its _Unwind_GetRegionStart made to return NULL too, and while it worked the g++.old-deja/g++.law/weak.C test case failed with a linkage error.  It looks like gcc emitted exception table data for a discarded piece of code.  On my ARM box the test suite considered that test case UNSUPPORTED so never ran it.

The first regression can probably be fixed by avoiding to define unneeded @LPStart pointers and making read_encoded_value mis-alignment safe.

The second regression might be fixable by making the @LPStart symbol references weak.

More fixes will be needed for libjava's stacktrace mess though.
Comment 34 Mikael Pettersson 2010-04-12 19:03:05 UTC
Created attachment 20371 [details]
eliminate use of _Unwind_GetRegionStart on ARM, take 2

The previous patch to eliminate _Unwind_GetRegionStart on ARM suffered from major misalignment exceptions.  The function pointers in the @LPStart fields weren't resolved by the linker but apparently required runtime fixup by glibc.  Since they are always misaligned (the LSDA blob starts aligned with a format byte followed by the @LPStart pointer), each @LPStart value resulted in two alignment exceptions.  Ouch.

The new version of the patch makes the @LPStart pointers PC-relative which eliminates the runtime fixups and thus the alignment exceptions.
Comment 35 Andrew Haley 2010-04-13 16:36:42 UTC
I've been trying this on gcc trunk, and it doesn't have the problem.
It seems that merging is not done.

I get

...
0x8684 <f2>: @0x8808
  Compact model 1
  0xb1 0x08 pop {r3}
  0x84 0x00 pop {r14}
  0xb0      finish
  0xb0      finish
...
0x86b0 <f1>: @0x8820
  Compact model 1
  0xb1 0x08 pop {r3}
  0x84 0x00 pop {r14}
  0xb0      finish
  0xb0      finish

Is it maybe the case that "Compact model 1" unwinder data isn't yet being merged?
Comment 36 Mikael Pettersson 2010-04-13 16:51:58 UTC
(In reply to comment #35)
> I've been trying this on gcc trunk, and it doesn't have the problem.
> It seems that merging is not done.
> 
> I get
> 
> ...
> 0x8684 <f2>: @0x8808
>   Compact model 1
>   0xb1 0x08 pop {r3}
>   0x84 0x00 pop {r14}
>   0xb0      finish
>   0xb0      finish
> ...
> 0x86b0 <f1>: @0x8820
>   Compact model 1
>   0xb1 0x08 pop {r3}
>   0x84 0x00 pop {r14}
>   0xb0      finish
>   0xb0      finish
> 
> Is it maybe the case that "Compact model 1" unwinder data isn't yet being
> merged?

They have to be adjacent to be merged. Looks like there's something else between f2 and f1 in your object code. (You are using binutils >= 2.20 right?)
Comment 37 Andrew Haley 2010-04-13 17:02:31 UTC
Subject: Re:  [4.4/4.5/4.6 regression] regressions in libjava
 testsuite on arm-linux

On 04/13/2010 05:52 PM, mikpe at it dot uu dot se wrote:

>> Is it maybe the case that "Compact model 1" unwinder data isn't yet being
>> merged?
> 
> They have to be adjacent to be merged. Looks like there's something else
> between f2 and f1 in your object code. (You are using binutils >= 2.20 right?)

Yes.  OK, that explains it: gcc trunk outputs the functions in a completely
different order.

Andrew.
Comment 38 Andrew Haley 2010-04-13 17:25:26 UTC
I think I can fairly easily add an option to the linker to suppress table merging when linking Java libraries, and that will completely solve the problem, at least from my point of view.  To that end, it would not be at all helpful to make _Unwind_GetRegionStart on ARM return NULL.
Comment 39 Andrew Haley 2010-04-21 16:34:52 UTC
Subject: Bug 40860

Author: aph
Date: Wed Apr 21 16:34:01 2010
New Revision: 158611

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=158611
Log:
2010-04-19  Andrew Haley  <aph@redhat.com>

	PR libgcj/40860
	* configure.ac: Handle --no-merge-exidx-entries.


Modified:
    trunk/libjava/ChangeLog
    trunk/libjava/configure
    trunk/libjava/configure.ac

Comment 40 Andrew Haley 2010-04-21 17:05:38 UTC
Subject: Bug 40860

Author: aph
Date: Wed Apr 21 17:04:42 2010
New Revision: 158614

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=158614
Log:
2010-04-19  Andrew Haley  <aph@redhat.com>

        PR libgcj/40860
        * configure.ac: Handle --no-merge-exidx-entries.


Modified:
    branches/gcc-4_4-branch/libjava/ChangeLog
    branches/gcc-4_4-branch/libjava/configure
    branches/gcc-4_4-branch/libjava/configure.ac

Comment 41 Andrew Haley 2010-04-22 16:07:11 UTC
Subject: Bug 40860

Author: aph
Date: Thu Apr 22 16:06:39 2010
New Revision: 158648

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=158648
Log:
2010-04-19  Andrew Haley  <aph@redhat.com>

        PR libgcj/40860
        * configure.ac: Handle --no-merge-exidx-entries.


Modified:
    branches/gcc-4_5-branch/libjava/ChangeLog
    branches/gcc-4_5-branch/libjava/configure
    branches/gcc-4_5-branch/libjava/configure.ac

Comment 42 Andrew Haley 2010-04-22 16:08:40 UTC
Fixed.