This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFA:] Fix PR optimization/15296, delayed-branch-slot bug.


A reviewer is supposed to be familiar with reorg.c.  Compile the test-case
below for cris-axis-elf with -Os on 3.4-branch; or -Os, -O2 or -O3 for the
3.3-branch.  The bug is present but not exposed by the test-case on trunk.

There will be an instruction setting R1 to -1 in a delay-slot, though R1
also assumed to hold a memory address at the branch target:

(insn 148 43 121 (sequence [
            (jump_insn 44 43 54 (set (pc)
                    (if_then_else (ne (cc0)
                            (const_int 0 [0x0]))
                        (label_ref 140)
                        (pc))) 151 {bne} (nil)
                (expr_list:REG_BR_PRED (const_int 4 [0x4])
                    (expr_list:REG_BR_PROB (const_int 5000 [0x1388])
                        (nil))))
            (insn 54 44 121 (set (reg/v:SI 1 r1 [orig:27 v1 ] [27])
                    (const_int -1 [0xffffffff])) 32 {*movsi_internal} (nil)
                (nil))
        ]) -1 (nil)
    (nil))

...
(code_label 140 124 77 11 "" [2 uses])

(insn 77 140 79 (set (mem/s:SI (post_inc:SI (reg/f:SI 1 r1 [37])) [0 <variable>.i+0 S4 A8])
        (reg/v:SI 13 r13 [orig:26 v0 ] [26])) 32 {*movsi_internal} (insn_list 133 (nil))
    (expr_list:REG_DEAD (reg/v:SI 13 r13 [orig:26 v0 ] [26])
        (expr_list:REG_INC (reg/f:SI 1 r1 [37])
            (nil))))


The wrongly placed -1 in the delay-slot comes from the *second* call to
fill_eager_delay_slots, but the cause is wrong data from the
relax_delay_slots "cleanup" just after the *first* call to
fill_eager_delay_slots.

For reference, when things have already gone wrong, at the time of that
second fill_eager_delay_slots call, insn 44 and its target are:

(jump_insn 44 43 121 (set (pc)
        (if_then_else (ne (cc0)
                (const_int 0 [0x0]))
            (label_ref 140)
            (pc))) 151 {bne} (nil)
    (expr_list:REG_BR_PROB (const_int 5000 [0x1388])
        (nil)))

...
(note 124 75 140 [bb 7] NOTE_INSN_BASIC_BLOCK)

(code_label 140 124 77 11 "" [2 uses])

(insn 77 140 79 (set (mem/s:SI (post_inc:SI (reg/f:SI 1 r1 [37])) [0 <variable>.i+0 S4 A8])
        (reg/v:SI 13 r13 [orig:26 v0 ] [26])) 32 {*movsi_internal} (insn_list 133 (nil))
    (expr_list:REG_DEAD (reg/v:SI 13 r13 [orig:26 v0 ] [26])
        (expr_list:REG_INC (reg/f:SI 1 r1 [37])
            (nil))))

There are no insns at the target which can fill the delay-slot, but the
fall-through execution path is "owned" (not the target of any branches) so
fill_eager_delay_slots will try to move an insn which must not clobber
anything used at the branch target, from the fallthrough-path to the
delay-slot.  Now, when fill_slots_from_thread calls mark_target_live_regs
(in resource.c) to check that validity, the global_live_at_start info for
that basic block says that these registers are live:

 0 [r0] 2 [r2] 13 [r13] 14 [sp]

Note the absence of register 1, r1, despite it being used in insn 77 at
the target.  With the original sequence of insns, before the *first*
fill_eager_delay_slots call, that data would have been true, because then
this was how bb 7 looked:

(code_label 75 74 124 5 ("l3") [2 uses])

(note 124 75 133 [bb 7] NOTE_INSN_BASIC_BLOCK)

(insn 133 124 77 (set (reg/f:SI 1 r1 [37])
        (reg/v/f:SI 0 r0 [orig:32 c ] [32])) 32 {*movsi_internal} (nil)
    (nil))

(insn 77 133 79 (set (mem/s:SI (post_inc:SI (reg/f:SI 1 r1 [37])) [0 <variable>.i+0 S4 A8])
        (reg/v:SI 13 r13 [orig:26 v0 ] [26])) 32 {*movsi_internal} (insn_list 133 (nil))
    (expr_list:REG_DEAD (reg/v:SI 13 r13 [orig:26 v0 ] [26])
        (expr_list:REG_INC (reg/f:SI 1 r1 [37])
            (nil))))

but insn 133 was moved into a delay-slot at that first call to
fill_eager_delay_slots call, leaving this sequence around bb 7:

...

(code_label 75 74 124 7 5 ("l3") [2 uses])

(note 124 75 142 [bb 7] NOTE_INSN_BASIC_BLOCK)

(insn 142 124 140 (use (insn/v 133 142 140 (set (reg/f:SI 1 r1 [37])
                (reg/v/f:SI 0 r0 [orig:32 c ] [32])) 32 {*movsi_internal} (nil)
            (nil))) -1 (nil)
    (nil))

(code_label 140 142 77 11 "" [1 uses])

(insn 77 140 79 (set (mem/s:SI (post_inc:SI (reg/f:SI 1 r1 [37])) [0 <variable>.i+0 S4 A8])
        (reg/v:SI 13 r13 [orig:26 v0 ] [26])) 32 {*movsi_internal} (insn_list 133 (nil))
    (expr_list:REG_DEAD (reg/v:SI 13 r13 [orig:26 v0 ] [26])
        (expr_list:REG_INC (reg/f:SI 1 r1 [37])
            (nil))))
...

Those familiar with the reorg.c/resource.c notice the USE-wrapped insn
as being the (hackish) way that reorg.c expresses that a
register-setting insn has been moved to another basic block, but
should be considered remaining wrt. register liveness; that the bb
global_live_at_start info should be augmented as if the insn was still
present when calculating live registers for insns following the USE.

All is well at this point.  Then dbr_schedule calls relax_delay_slots
to "improve" the optimization result.

Eventually, relax_delay_slots comes to insn 44 which has an unfilled
delay-slot, and compares the target_label with prev_label
(next_active_insn (target_label)), to see if it "jumps to a label that is
not the last of a group of consecutive labels".  Because next_active_insn
ignores USE insns, it considers code_label 75 and code_label 140 to be
consecutive with no insns in between, and thus the target is redirected
through reorg_redirect_jump.  Since the original label is unused after the
redirection, it and the insns that follows it are deleted through
delete_related_insns.  That unfortunately includes the USE placeholders,
and so the register liveness info is made inconsistent.

If it wasn't for one thing, I'd think that this bug, where the hackish
USE-wrapping-machinery and the use of the deprecated delete_related_insns
trips over each other causing register liveness info to end up
inconsistent, would be the final indicator that reorg.c should be
rewritten immediately.  (Well, that still is the case IMHO, but not as
urgent as in being the only reasonable fix for this bug.)

That one thing is a construct ~140 lines further down in the same
function, where there's a comment saying:

	  /* We use next_real_insn instead of next_active_insn, so that
	     the special USE insns emitted by reorg won't be ignored.
	     If they are ignored, then they will get deleted if target_label
	     is now unreachable, and that would cause mark_target_live_regs
	     to fail.  */

which *exactly* describes the bug.  The problem would now be in checking
whether there are other similar cases where next_real_insn should be used
instead of next_active_insn.  I checked (prev_active_insn and)
next_active_insn usage in reorg.c, or at least those uses that were
controlling a reorg_redirect_jump call within the next few lines.  Besides
the point of failure, that only left one other dangerous use seems to be
one in fill_simple_delay_slots.

Besides cris-axis-linux-gnu and cris-elf I checked gcc-3.4-branch cross to
mips-elf(*), mipsisa64-elf(*), sh-elf(*) and frv-elf, which seems a
complete list of more-or-less working simulator targets sporting
delay-slots.  (For some reason, the test framework for targets marked (*)
seem in shoddy shape on trunk and it looks like g++ tests and f77 tests
don't even link.  This is an unrelated failure.)

Similarly I've checked trunk of "Sat May 1 15:40:04 GMT 2004" to mips-elf,
sh-elf and frv-elf (mipsisa64-elf doesn't build there) and gcc-3.3-branch
to mipsisa64-elf, mips-elf and sh-elf (frv-elf doesn't build there).

Bootstrap and checking for the gcc-3.4-branch is complete for
sparc-sun-solaris2.7.  Java doesn't build and the linker script for
libgcc_s.so doesn't parse for Sun ld, so necessary options are
--enable-languages=c++,f77,objc --disable-multilib --disable-shared
--disable-nls.  I will also check the 3.3-branch and trunk, but the
process is a bit slow and looks like it will take at least another full
day.  It'd be nice to check against a SPARC simulator target, being the
only other well-known target with delay-slots, but there's no cross
toolchain with a simulator.  It's true that simtest-howto.html does list
sparc-elf/sparc-sim, but that entry is bogus or incomplete AFAICT; there's
no sparc-sim.exp in the sourceware dejagnu tree.  (Perhaps there is a
sparc-sim.exp externally; if so that necessary information is missing.)

Again, this test-case unfortunately doesn't expose the bug on trunk, only
on the 3.3 branch (-Os -O2 and above) and 3.4 branch (-Os only) and only
for cris-*.  It also does not expose the supposed potential bug in
fill_simple_delay_slots.

The test-case is automatically reduced from a quite larger,
machine-generated function and manually obfuscated to make it an
executable test-case.  More about the automatic test-case-reducer later,
when I'm in sync with the mailing lists again (currently I'm ~1month
behind) if I don't see anyone else working on one usable with GCC.  (From
its documentation, it didn't look like the one in LLVM would be usable
without LLVM.)

Ok to commit to trunk?
Ok for the 3.3 branch?
Ok for 3.4?
Should I confine the patch to only the actually-failing chunk?

	* reorg.c (fill_simple_delay_slots): Use next_real_insn when
	getting last consecutive label at a branch.
	(relax_delay_slots): Similar, near top of loop.

Index: reorg.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/reorg.c,v
retrieving revision 1.90.4.1
diff -p -c -u -p -r1.90.4.1 reorg.c
--- reorg.c	23 Jan 2004 23:36:02 -0000	1.90.4.1
+++ reorg.c	2 May 2004 10:02:38 -0000
@@ -2358,7 +2358,9 @@ fill_simple_delay_slots (int non_jumps_p
 	      && eligible_for_delay (insn, slots_filled, next_trial, flags)
 	      && ! can_throw_internal (trial))
 	    {
-	      rtx new_label = next_active_insn (next_trial);
+	      /* See comment in relax_delay_slots about necessity of using
+		 next_real_insn here.  */
+	      rtx new_label = next_real_insn (next_trial);
 
 	      if (new_label != 0)
 		new_label = get_label_before (new_label);
@@ -3083,7 +3085,9 @@ relax_delay_slots (rtx first)
 	  && (target_label = JUMP_LABEL (insn)) != 0)
 	{
 	  target_label = follow_jumps (target_label);
-	  target_label = prev_label (next_active_insn (target_label));
+	  /* See comment further down why we must use next_real_insn here,
+	     instead of next_active_insn.  */
+	  target_label = prev_label (next_real_insn (target_label));
 
 	  if (target_label == 0)
 	    target_label = find_end_label ();


--- /dev/null	Tue Oct 29 15:57:07 2002
+++ gcc.c-torture/execute/pr15296.c	Wed May  5 16:21:12 2004
@@ -0,0 +1,73 @@
+/* PR optimization/15296.  The delayed-branch scheduler caused code that
+   SEGV:d for CRIS; a register was set to -1 in a delay-slot for the
+   fall-through code, while that register held a pointer used in code at
+   the branch target.  */
+
+typedef int __attribute__ ((mode (__pointer__))) intptr_t;
+typedef intptr_t W;
+union u0
+{
+  union u0 *r;
+  W i;
+};
+struct s1
+{
+  union u0 **m0;
+  union u0 m1[4];
+};
+
+void f (void *, struct s1 *, const union u0 *, W, W, W)
+     __attribute__ ((__noinline__));
+void g (void *, char *) __attribute__ ((__noinline__));
+
+void
+f (void *a, struct s1 *b, const union u0 *h, W v0, W v1, W v4)
+{
+  union u0 *e = 0;
+  union u0 *k = 0;
+  union u0 **v5 = b->m0;
+  union u0 *c = b->m1;
+  union u0 **d = &v5[0];
+l0:;
+  if (v0 < v1)
+    goto l0;
+  if (v0 == 0)
+    goto l3;
+  v0 = v4;
+  if (v0 != 0)
+    goto l3;
+  c[0].r = *d;
+  v1 = -1;
+  e = c[0].r;
+  if (e != 0)
+    g (a, "");
+  k = e + 3;
+  k->i = v1;
+  goto l4;
+l3:;
+  c[0].i = v0;
+  e = c[1].r;
+  if (e != 0)
+    g (a, "");
+  e = c[0].r;
+  if (e == 0)
+    g (a, "");
+  k = e + 2;
+  k->r = c[1].r;
+l4:;
+}
+
+void g (void *a, char *b) { abort (); }
+
+int
+main ()
+{
+  union u0 uv[] = {{ .i = 111 }, { .i = 222 }, { .i = 333 }, { .i = 444 }};
+  struct s1 s = { 0, {{ .i = 555 }, { .i = 0 }, { .i = 999 }, { .i = 777 }}};
+  f (0, &s, 0, 20000, 10000, (W) uv);
+  if (s.m1[0].i != (W) uv || s.m1[1].i != 0 || s.m1[2].i != 999
+      || s.m1[3].i != 777 || uv[0].i != 111 || uv[1].i != 222
+      || uv[2].i != 0 || uv[3].i != 444)
+    abort ();
+  exit (0);
+}

brgds, H-P


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]