Bug 45223 - RTL PRE GCSE pass hoists trapping insn out of loop
Summary: RTL PRE GCSE pass hoists trapping insn out of loop
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 4.5.2
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL: http://gcc.gnu.org/ml/gcc-patches/201...
Keywords: wrong-code
Depends on: 42108
Blocks:
  Show dependency treegraph
 
Reported: 2010-08-07 08:24 UTC by Uroš Bizjak
Modified: 2021-10-11 09:11 UTC (History)
1 user (show)

See Also:
Host:
Target: moxie-elf
Build:
Known to work:
Known to fail:
Last reconfirmed: 2010-08-07 11:26:29


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Uroš Bizjak 2010-08-07 08:24:35 UTC
This is a follow-up from PR38819 which partially fixes this problem. The problem remains in RTL optimization passes, and can be triggered on targets that implement modulo instruction in the hardware (i.e. moxie-elf).

The testcase from PR38819 compiles with -O2 to:

main:
	push   $sp, $r6
	push   $sp, $r7
	push   $sp, $r8
	push   $sp, $r9
	push   $sp, $r10
	push   $sp, $r11
	dec    $sp, 24
	lda.l  $r7, a
	lda.l  $r0, b
	xor    $r6, $r6
	ldi.l  $r11, foo
>	mod.l  $r7, $r0
	ldi.l  $r10, 8
	ldi.l  $r8, 99
	jmpa   .L4
.L3:
	...

Since b is initialized to zero, mod.l instruction traps.
Comment 1 Uroš Bizjak 2010-08-07 08:33:59 UTC
Following patch fixes this problem:

Index: gcse.c
===================================================================
--- gcse.c	(revision 162975)
+++ gcse.c	(working copy)
@@ -1693,7 +1693,7 @@ compute_hash_table_work (struct hash_tab
 
       /* The next pass builds the hash table.  */
       FOR_BB_INSNS (current_bb, insn)
-	if (INSN_P (insn))
+	if (INSN_P (insn) && !may_trap_p (PATTERN (insn)))
 	  hash_scan_insn (insn, table);
     }
 

main:
	push   $sp, $r6
	push   $sp, $r7
	push   $sp, $r8
	push   $sp, $r9
	push   $sp, $r10
	push   $sp, $r11
	dec    $sp, 28
	lda.l  $r0, a
	sto.l  -28($fp), $r0
	lda.l  $r11, b
	xor    $r6, $r6
	ldi.l  $r10, foo
	ldi.l  $r8, 8
	ldi.l  $r7, 99
	jmpa   .L4
.L3:
	lda.l  $r1, r
	ldo.l  $r0, -28($fp)
	mod.l  $r0, $r11
	add.l  $r0, $r1
	add.l  $r0, $r6
	sta.l  r, $r0
	inc    $r6, 1
	cmp    $r6, $r7
	bgt   .L7
.L4:
	jsr    $r10
	lda.l  $r0, x
	cmp    $r0, $r8
	bne   .L3
	inc    $r6, 1
	lda.l  $r1, r
	ldo.l  $r0, -28($fp)
>	mod.l  $r0, $r11
	add.l  $r0, $r1
	add.l  $r0, $r6
	sta.l  r, $r0
	inc    $r6, 1
	cmp    $r6, $r7
	ble   .L4
.L7:
	jsra   abort

In a general case, I guess that if it can be proved that denominator can't be zero we can still hoist mod.l out of the loop.
Comment 2 Uroš Bizjak 2010-08-07 08:40:44 UTC
Ugh, with a bit changed testcase:

--cut here--
extern void exit (int);
extern void abort (void);

volatile float a = 1;
volatile float b = 0;
volatile int x = 2;
volatile signed int r = 8;

void __attribute__((noinline))
foo (void)
{
  exit (0);
}

int
main (void)
{
  float si1 = a;
  float si2 = b;
  int i;

  for (i = 0; i < 100; ++i) {
      foo ();
      if (x == 8)
	i++;
      r += i + (int) (si1 / si2);
  }
  abort ();
}
--cut here--

-O2 on x86_64-pc-linux-gnu:

main:
	pushq	%rbx
	xorl	%ebx, %ebx
	subq	$16, %rsp
	movss	a(%rip), %xmm0
	movss	%xmm0, 12(%rsp)
	movss	b(%rip), %xmm0
	movss	12(%rsp), %xmm1
>	divss	%xmm0, %xmm1
	movss	%xmm1, 12(%rsp)
.L4:
	call	foo
	...

I hope that Ariane-5 is safe [1] ;)

[1] http://en.wikipedia.org/wiki/Ariane_5_Flight_501
Comment 3 Steven Bosscher 2010-08-07 10:57:54 UTC
Patch of comment #1 loops obviously OK to me. We shouldn't want to move trapping insns in any case that I can think of.
Comment 4 Uroš Bizjak 2010-08-07 11:26:29 UTC
(In reply to comment #3)
> Patch of comment #1 loops obviously OK to me. We shouldn't want to move
> trapping insns in any case that I can think of.

OK, will post patch to gcc-patches after regression test.
Comment 5 Uroš Bizjak 2010-08-07 15:27:10 UTC
Patch at [1].

[1] http://gcc.gnu.org/ml/gcc-patches/2010-08/msg00553.html
Comment 6 Uroš Bizjak 2010-08-12 14:46:04 UTC
Ouch, "Here are the ration of before and after on Intel Core i7. Gzip slowed
down by 10 to 20%." [1]

Richi says: "The fix is to teach LIM to do conditional invariant motion."

Probably also related to PR42108 catch-all PR.

[1] http://gcc.gnu.org/ml/gcc-patches/2010-08/msg00771.html
[2] http://gcc.gnu.org/ml/gcc-patches/2010-08/msg00884.html
Comment 7 Uroš Bizjak 2010-09-04 10:03:17 UTC
Unassigning...
Comment 8 Andrew Pinski 2021-10-11 03:44:38 UTC
So I think this has been fully fixed for GCC 12 via r12-2254 (aka PR 101373).
Comment 9 Richard Biener 2021-10-11 09:11:51 UTC
But this but is about non-MEMs while the fixes were involving only memory ops IIRC.