Bug 42841 - [4.3/4.4/4.5 Regression] SH: Assembler complains pcrel too far.
Summary: [4.3/4.4/4.5 Regression] SH: Assembler complains pcrel too far.
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.4.3
: P4 normal
Target Milestone: 4.5.0
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2010-01-22 07:09 UTC by Nobuhiro Iwamatsu
Modified: 2010-07-06 18:30 UTC (History)
5 users (show)

See Also:
Host:
Target: sh4-linux-gnu
Build:
Known to work: 4.2.4
Known to fail: 4.3.4 4.4.3 4.5.0
Last reconfirmed: 2010-01-22 08:11:31


Attachments
The source code that can reproduce a problem. (10.23 KB, text/x-csrc)
2010-01-22 07:12 UTC, Nobuhiro Iwamatsu
Details
Conservative fix. (454 bytes, patch)
2010-01-22 12:06 UTC, Christian Bruel
Details | Diff
and cleanup with JUMP_TABLE_DATA_P (752 bytes, patch)
2010-01-22 13:49 UTC, Christian Bruel
Details | Diff
A test case (210.69 KB, application/octet-stream)
2010-01-27 14:08 UTC, Kazumoto Kojima
Details
fixed removal of landing pad label rtx (829 bytes, patch)
2010-01-29 07:46 UTC, chrbr
Details | Diff
A patch (1.52 KB, patch)
2010-02-02 22:16 UTC, Kazumoto Kojima
Details | Diff
patch to fix GOT access load with constant pool (1.10 KB, patch)
2010-02-03 13:12 UTC, chrbr
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Nobuhiro Iwamatsu 2010-01-22 07:09:43 UTC
I found a bug of pcrel too far in gcc-4.4.3 and gcc-4.3.4 on sh-elf. 
Sorry. There is the code to reappear, but cannot lower it.

$ gcc -O2 -fPIC -DPIC -c gong_1424_debug_1.c
/tmp/ccjdeDlE.s: Assembler messages:
/tmp/ccjdeDlE.s:714: Error: pcrel too far

When I don't optimize it, it doesn't become the error.

$ gcc-4.4 -v 
Using built-in specs.
Target: sh4-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.4.2-9' --with-bugurl=file:///usr/share/doc/gcc-4.4/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --enable-shared --enable-multiarch --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.4 --program-suffix=-4.4 --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --with-multilib-list=m4,m4-nofpu --with-cpu=sh4 --enable-checking=release --build=sh4-linux-gnu --host=sh4-linux-gnu --target=sh4-linux-gnu
Thread model: posix
gcc version 4.4.3 20100108 (prerelease) (Debian 4.4.2-9) 

$ gcc-4.3 -v 
Using built-in specs.
Target: sh4-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.3.4-6+sh4' --with-bugurl=file:///usr/share/doc/gcc-4.3/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --enable-shared --enable-multiarch --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/include/c++/4.3 --program-suffix=-4.3 --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --enable-mpfr --with-multilib-list=m4,m4-nofpu --with-cpu=sh4 --enable-checking=release --build=sh4-linux-gnu --host=sh4-linux-gnu --target=sh4-linux-gnu
Thread model: posix
gcc version 4.3.4 (Debian 4.3.4-6)
Comment 1 Nobuhiro Iwamatsu 2010-01-22 07:12:03 UTC
Created attachment 19687 [details]
The source code that can reproduce a problem.
Comment 2 Kazumoto Kojima 2010-01-22 08:11:31 UTC
I've confirmed that the test case also fails on 4.5.0 and doesn't on 4.2.4.
Comment 3 Christian Bruel 2010-01-22 11:47:53 UTC
Hello,

I had a similar problem a while ago, but was never able to reproduce on trunk.

I was a phasing problem between branch_shortening from sh_reorg and the 
delayed branch scheduler, that would change the size of a bf (2) against a
bf/s+instruction (4). Thus breaking surrounding branch offsets.

-fno-delayed-branch is the workaround.




Comment 4 Christian Bruel 2010-01-22 12:06:06 UTC
Created attachment 19689 [details]
Conservative fix.

Conservatively increase length of undelayed conditional branches to prevent a problem with the ds scheduler inserting an instruction in the slot.
Comment 5 Kazumoto Kojima 2010-01-22 12:33:40 UTC
(In reply to comment #4)
> Conservatively increase length of undelayed conditional branches to prevent a
> problem with the ds scheduler inserting an instruction in the slot.

Looks fine.  A very minor nit, JUMP_P and JUMP_TABLE_DATA_P macro
can be used for the first 3 lines of the if-condition.

Comment 6 Christian Bruel 2010-01-22 12:58:31 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > Conservatively increase length of undelayed conditional branches to prevent a
> > problem with the ds scheduler inserting an instruction in the slot.
> 
> Looks fine.  A very minor nit, JUMP_P and JUMP_TABLE_DATA_P macro
> can be used for the first 3 lines of the if-condition.
> 

Thanks. I don't think I can use JUMP_TABLE_DATA_P since this is a != test and JUMP_TABLE_DATA_P includes JUMP_P. 

Anyway, OK for trunk ? (just need to fix the date in the ChangeLog). regtesting done.
 

Comment 7 Kazumoto Kojima 2010-01-22 13:21:58 UTC
(In reply to comment #6)
> Anyway, OK for trunk ? (just need to fix the date in the ChangeLog). regtesting
> done.

OK.  And the patch is pre-approved for branches too after one week or so.

BTW, I mean JUMP_P(x) && !JUMP_TABLE_DATA_P(x):

   a && !(a && (b || c))
== a &&(!a || !(b || c))
== (a && !a) || (a && !(b || c))
== 0 || (a && !(b || c))
== a && !b && !c

Comment 8 Christian Bruel 2010-01-22 13:49:39 UTC
Created attachment 19690 [details]
and cleanup with JUMP_TABLE_DATA_P
Comment 9 Christian Bruel 2010-01-22 13:51:31 UTC
(In reply to comment #7)
> (In reply to comment #6)
> > Anyway, OK for trunk ? (just need to fix the date in the ChangeLog). regtesting
> > done.
> 
> OK.  And the patch is pre-approved for branches too after one week or so.
> 
> BTW, I mean JUMP_P(x) && !JUMP_TABLE_DATA_P(x):
>
didn't read you correctly. So I took the opportunity to cleanup every other occurrences of the same idioms in the file. OK ?
Comment 10 Kazumoto Kojima 2010-01-22 14:28:06 UTC
(In reply to comment #9)
> So I took the opportunity to cleanup every other
> occurrences of the same idioms in the file. OK ?

OK.  Thanks!


Comment 11 chrbr 2010-01-26 07:20:57 UTC
Subject: Bug 42841

Author: chrbr
Date: Tue Jan 26 07:20:27 2010
New Revision: 156229

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=156229
Log:
fix PR target/42841

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/sh/sh.c

Comment 12 chrbr 2010-01-26 07:22:18 UTC
Subject: Bug 42841

Author: chrbr
Date: Tue Jan 26 07:21:57 2010
New Revision: 156230

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=156230
Log:
fix PR target/42841

Modified:
    branches/gcc-4_4-branch/gcc/ChangeLog
    branches/gcc-4_4-branch/gcc/config/sh/sh.c

Comment 13 chrbr 2010-01-26 07:28:27 UTC
Subject: Bug 42841

Author: chrbr
Date: Tue Jan 26 07:28:05 2010
New Revision: 156231

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=156231
Log:
fix PR target/42841

Modified:
    branches/gcc-4_3-branch/gcc/ChangeLog
    branches/gcc-4_3-branch/gcc/config/sh/sh.c

Comment 14 chrbr 2010-01-26 07:29:32 UTC
fixed in 4.5, 4.3 and 4.4
Comment 15 Nobuhiro Iwamatsu 2010-01-26 23:54:33 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > So I took the opportunity to cleanup every other
> > occurrences of the same idioms in the file. OK ?
> 
> OK.  Thanks!
> 

Thanks for your patch!
I confirmed that problem on other program was fixed with this patch.
Comment 16 Kazumoto Kojima 2010-01-27 12:32:36 UTC
I've got some new libstdc++-v3 testsuite failures with the patch
on my nightly sh4-linux tester:

Running /exp/ldroot/dodes/ORIG/trunk/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp ...
FAIL: 23_containers/deque/requirements/exception/basic.cc (test for excess errors)
WARNING: 23_containers/deque/requirements/exception/basic.cc compilation failed to produce executable
FAIL: 23_containers/deque/requirements/exception/propagation_consistent.cc (test for excess errors)
WARNING: 23_containers/deque/requirements/exception/propagation_consistent.cc compilation failed to produce executable
FAIL: 30_threads/packaged_task/members/get_future.cc execution test
FAIL: 30_threads/shared_future/members/get.cc execution test

The first failure is

/tmp/ccl5TCl4.s: Assembler messages:
/tmp/ccl5TCl4.s:43070: Error: undefined symbol `.L3394' in operation

FAIL: 23_containers/deque/requirements/exception/basic.cc (test for excess errors)

The last 2 failures are resulted with the unaligned accesses.  I saw

Sending SIGBUS to "get_future.exe" due to unaligned access (PC 296554a8 PR 2965549a)
Sending SIGBUS to "get.exe" due to unaligned access (PC 296554a8 PR 2965549a)

on the target machine.
With reverting the first hunk of the patch, these errors go away.
Christian, could you please revert or disable the first hunk
of patches temporarily?  Sorry I didn't catch this earlier.
Comment 17 chrbr 2010-01-27 12:50:45 UTC
strange, I didn't see that, even the undefined symbol in the assembler. 

OK I disable the fix until this is clarified.

Let me do a recheck on the silicium, will let you know. 

-c

(In reply to comment #16)
> I've got some new libstdc++-v3 testsuite failures with the patch
> on my nightly sh4-linux tester:
> 
> Running
> /exp/ldroot/dodes/ORIG/trunk/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp
> ...
> FAIL: 23_containers/deque/requirements/exception/basic.cc (test for excess
> errors)
> WARNING: 23_containers/deque/requirements/exception/basic.cc compilation failed
> to produce executable
> FAIL: 23_containers/deque/requirements/exception/propagation_consistent.cc
> (test for excess errors)
> WARNING: 23_containers/deque/requirements/exception/propagation_consistent.cc
> compilation failed to produce executable
> FAIL: 30_threads/packaged_task/members/get_future.cc execution test
> FAIL: 30_threads/shared_future/members/get.cc execution test
> 
> The first failure is
> 
> /tmp/ccl5TCl4.s: Assembler messages:
> /tmp/ccl5TCl4.s:43070: Error: undefined symbol `.L3394' in operation
> 
> FAIL: 23_containers/deque/requirements/exception/basic.cc (test for excess
> errors)
> 
> The last 2 failures are resulted with the unaligned accesses.  I saw
> 
> Sending SIGBUS to "get_future.exe" due to unaligned access (PC 296554a8 PR
> 2965549a)
> Sending SIGBUS to "get.exe" due to unaligned access (PC 296554a8 PR 2965549a)
> 
> on the target machine.
> With reverting the first hunk of the patch, these errors go away.
> Christian, could you please revert or disable the first hunk
> of patches temporarily?  Sorry I didn't catch this earlier.
> 

Comment 18 chrbr 2010-01-27 13:24:58 UTC
Subject: Bug 42841

Author: chrbr
Date: Wed Jan 27 13:24:40 2010
New Revision: 156282

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=156282
Log:
temporarily revert fix for PR target/42841

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/sh/sh.c

Comment 19 chrbr 2010-01-27 13:40:56 UTC
to make sure we are in the same testing/configuration environment could you please send me the preprocessed file for 23_containers/deque/requirements/exception/propagation_consistent.cc as well as the compilation line in libstdc++.log that you used ?

many thanks

Christian
Comment 20 Kazumoto Kojima 2010-01-27 14:08:29 UTC
Created attachment 19729 [details]
A test case

"cc1plus -std=gnu++0x -O2 propagation_consistent.ii" produces
a problematic code here.
Comment 21 chrbr 2010-01-27 18:13:28 UTC
This one is marked as unsupported in my sh-superh-elf log, But I can reproduce it now on sh4-linux. (despite that I have rebuilt a whole distrib without seeing it :O).

Anyway I'm investigating. I'm reopening the bug and will revert in the branches as well if I don't find a quick solution.

Regards

(In reply to comment #20)
> Created an attachment (id=19729) [edit]
> A test case
> 
> "cc1plus -std=gnu++0x -O2 propagation_consistent.ii" produces
> a problematic code here.
> 

Comment 22 chrbr 2010-01-28 13:09:41 UTC
humm, looks like a latent bug. Accidentally the CP is inserted before a compact_jump, which enable further redirect jump optimisation. I don't think it is directly related to the fix, but lets work it a little bit more.

so we have just before dbr:
jump_insn -> 2586
a constant pool
L2586
jump_insn -> 3394

L3394: ...

then in reorg_redirect_jump we redirect the jump over the CP and delete_related_insn so the code between the CP and the jump becomes dead.

and we have 

jump_insn -> 3394
a constant pool

L3394
...

but the label L2586 is used in the exception table... and thus remains undefined.

now my question: how the exception table can refer to a region delimited by deleted labels. It's should be built after dbr isn't it ?

Comment 23 Kazumoto Kojima 2010-01-29 02:19:57 UTC
I agree with you that there is a latent problem.  It seems
that sh_reorg inserts a CP with a new jump at the landing pad
for the exception in basic.cc and propagation_consistent.cc
cases.  This confuses EH processing because the labels for
landing pads are defined and recorded very early and used later
to output DW2 frame data, unfortunately.  A simple work around
may be not to insert a jump+CP at the possible position for
the landing pad.  The patch

--- ORIG/trunk/gcc/config/sh/sh.c	2010-01-26 18:33:47.000000000 +0900
+++ trunk/gcc/config/sh/sh.c	2010-01-29 09:56:50.000000000 +0900
@@ -4641,6 +4641,9 @@ find_barrier (int num_mova, rtx mova, rt
 	 a jump makes it more likely that the bra delay slot will be
 	 filled.  */
       while (NOTE_P (from) || JUMP_P (from)
+	     || (flag_exceptions
+		 && CALL_P (from)
+		 && find_reg_note (from, REG_EH_REGION, NULL_RTX))
 	     || LABEL_P (from))
 	from = PREV_INSN (from);
 
fixes the failures for basic.cc and propagation_consistent.cc,
though it doesn't fix the execution failures for get_future.cc
and get.cc.
Comment 24 chrbr 2010-01-29 07:46:22 UTC
Created attachment 19747 [details]
fixed removal of landing pad label rtx

The landing_pad label rtx was created and recorded in tree_inline (duplicate_eh_regions). Seems that reorg_redirect_jump or delete_insn should check for it before deciding it can be removed.

I'm testing this patch that does this.
Comment 25 chrbr 2010-01-29 08:59:56 UTC
by the way, FYI, trying to explain the differences between your results and mine for sh4-linux. my build was is configured with --enable-target-optspace, so all my runtime build tests are ran with -Os, not -O2 like yours. Which could make a huge differences in CP layout...
I repass in -O2 over the week end.
Cheers

Comment 26 chrbr 2010-02-01 16:30:15 UTC
I'm afraid the unaligned access sigbug regression is another latent bug just exhibited by the fix for the original PR :-(

what happens is the the GOT loading sequence is broken by a constant pool:

we end up to emit:

	mov.l	.L542,r12    (X)
	bra	.L516
	nop
.L542:
       .align 2
	.long	_GLOBAL_OFFSET_TABLE_
       ...
.L516:
	mova	.L545,r0   (Y) !!!!!
        add	r0,r12
.L545:
	.long	_GLOBAL_OFFSET_TABLE_

The reason for that is that the second mova instruction is unluckily now out of range by 2 bytes. (which could happen with any other situation, even without this patch).

IMHO We should forbid the duplication of a _GLOBAL_OFFSET_TABLE_ loading constant while in a UNSPEC_MOVA sequence.

We should probably reduce si_limit in find_barrier when a 
(set (reg:SI 0 r0)
        (unspec:SI [
                (const:SI (unspec:SI [
                            (symbol_ref ("*_GLOBAL_OFFSET_TABLE_"))
is met and next is
 (set (reg:SI 12 r12)
        (const:SI (unspec:SI [
                    (symbol_ref ("*_GLOBAL_OFFSET_TABLE_") 

in PIC.

I experimenting with a couple of different solutions in this direction.

this PR was a really interesting bugs finder.... !



Comment 27 Kazumoto Kojima 2010-02-02 22:16:15 UTC
Created attachment 19792 [details]
A patch

Indeed!  I've tested the attached patch and confirmed that
it doesn't regress with the top level "make -k check" for all
languages except ada on sh4-linux.
Comment 28 chrbr 2010-02-03 08:30:36 UTC
Hello Kaj, thanks for your proposal

thanks for the proposal. but I'm wondering if preventing the scheduling of the mov.l and mova instructions are not too much overkill ? (sh_reorg comes after the scheduler, but even if it didn't that should be ok to mov up instructions. 
(the R0 liverange between the add and load is another more general problem)
Do I miss something ?

We only want to avoid the CP to be inserted between those 2 instructions, it's not necessary to have more blockages. I'm working on something that tracks the  GOT loading access during the find_barrier walk and then revert back at the end to the latest safe place. OK on the example but the full linux distrib rebuild and validation is still ongoing.





Comment 29 Kazumoto Kojima 2010-02-03 10:06:05 UTC
I think these blockages are not overkill.  GOTaddr2picreg is used only
at prologue and non-pic tls initial exec accesses.  The former is at
most once for each function and never in the minor loop.  The latter
case wouldn't occur so frequently and the initial exec access is loaded
sequence of instructions in the first place.

> We only want to avoid the CP to be inserted between those 2 instructions,
> it's not necessary to have more blockages. I'm working on something that
> tracks the GOT loading access during the find_barrier walk and then revert
> back at the end to the latest safe place. OK on the example but the full
> linux distrib rebuild and validation is still ongoing.

Of course, it's OK if it passes all the usual tests.
Comment 30 chrbr 2010-02-03 13:12:01 UTC
Created attachment 19794 [details]
patch to fix GOT access load with constant pool

Patch under validation.
Comment 31 Kazumoto Kojima 2010-02-04 22:42:13 UTC
Looks smart and clean!  One minor nit, I guess that the occurence of
gbr and GBR in ChangeLog and comments should be replaced with GOT to
avoid confusion with GBR register of SH CPU.
When you propose it to the list, could you please separate the third
hunk which is for the original PR42841 as an independant patch.  Also
don't forget to update the copyright years in the first one.
Comment 32 chrbr 2010-02-05 07:05:12 UTC
> Looks smart and clean!  One minor nit, I guess that the occurence of
> gbr and GBR in ChangeLog and comments should be replaced with GOT to
> avoid confusion with GBR register of SH CPU.

Thanks for catching up this error in the comment. I meant GP of course, which is even more preferable that GOT (which is what we load, not what we compute).

(In reply to comment #31)
> When you propose it to the list, could you please separate the third
> hunk which is for the original PR42841 as an independant patch.  Also
> don't forget to update the copyright years in the first one.
> 

OK, that was also my intention to submit the 3rd hunk (the one that fixes the jump to the landing pad around the constant table right ?) as a separate patch as it will require the approval of a middle end maintainer. 
If it cannot go in the trunk before the 4.5 freeze I can propose you to commit your workaround (comment #23) so not to block the regression. Then we can revert when the proper patch is discussed/accepted. (I'm a little bit late for that sorry).
Comment 33 Kazumoto Kojima 2010-02-05 07:42:59 UTC
Your fix of the middle end looks plausible but I think the target
shouldn't generate a CP at the eh landing pad anyway.  I'll commit
the hunk below anyway after your patch for pic problem is installed.

@@ -4654,6 +4654,13 @@ find_barrier (int num_mova, rtx mova, rt
       if (last_got)
 	from = PREV_INSN (last_got);
 
+      /* Don't insert the constant pool table at the position which
+	 may be the landing pad.  */
+      if (flag_exceptions
+	  && CALL_P (from)
+	  && find_reg_note (from, REG_EH_REGION, NULL_RTX))
+	from = PREV_INSN (from);
+
       /* Walk back to be just before any jump or label.
 	 Putting it before a label reduces the number of times the branch
 	 around the constant pool table will be hit.  Putting it before
Comment 34 chrbr 2010-02-05 08:26:35 UTC
(In reply to comment #33)
> Your fix of the middle end looks plausible but I think the target
> shouldn't generate a CP at the eh landing pad anyway.  I'll commit
> the hunk below anyway after your patch for pic problem is installed.
> 

OK. I didn't check the code quality difference between the middle-end fix and yours. Since there are no fallthru to the landing pad, and locality with the upcoming exception region is not important, (if we suppose that the exception handler is not on the critical path), I was expecting that the landing pad was a good place for the constant pool on the contrary. 

Comment 35 Kazumoto Kojima 2010-02-05 21:54:00 UTC
(In reply to comment #34)
> I was expecting that the landing pad was
> a good place for the constant pool on the contrary. 

I thought so too.  But on second thought, it'd be a bit surprising
for the non CP world and may cause similar problems.  We should be
defensive in this regard, I think.


Comment 36 chrbr 2010-02-10 12:02:15 UTC
(In reply to comment #33)
> Your fix of the middle end looks plausible but I think the target
> shouldn't generate a CP at the eh landing pad anyway.  I'll commit
> the hunk below anyway after your patch for pic problem is installed.
> 

done, you can commit your w/a.

> @@ -4654,6 +4654,13 @@ find_barrier (int num_mova, rtx mova, rt
>        if (last_got)
>         from = PREV_INSN (last_got);
> 
> +      /* Don't insert the constant pool table at the position which
> +        may be the landing pad.  */
> +      if (flag_exceptions
> +         && CALL_P (from)
> +         && find_reg_note (from, REG_EH_REGION, NULL_RTX))
> +       from = PREV_INSN (from);
> +
>        /* Walk back to be just before any jump or label.
>          Putting it before a label reduces the number of times the branch
>          around the constant pool table will be hit.  Putting it before
> 

Comment 37 Richard Biener 2010-04-13 10:29:40 UTC
*** Bug 43744 has been marked as a duplicate of this bug. ***