reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets

Hans-Peter Nilsson hp@axis.com
Thu Aug 20 17:15:18 GMT 2020


> From: Richard Sandiford <richard.sandiford@arm.com>
> Date: Thu, 20 Aug 2020 10:30:56 +0200

> Anything I once knew about reorg.c has long since faded away, but since
> noone else has reviewed it...

Thanks.  I forgot to add PATCH and/or RFA: in the subject and
forgot to CC Eric, assuming he's interested (I did CC him as a
heads-up that this was coming, that's why I added you now, Eric).

> Do you know what guarantees that REG_DEAD and REG_UNUSED notes are
> reliable during reorg.c?  It was written at a time when passes were
> expected to keep the notes up-to-date, but that's not true these days.
> My worry is that they might be mostly right, but just stale enough
> to be harmful in corner cases.

They are depended upon in reorg.c as absolutely accurate (and
kept updated), and there's IMO enough empirical evidence that
they still are.  In this particular patch, I'm using it in the
same way Eric used it in 33c2207d3fda for
fill_simple_delay_slots (i.e. it would be similarly flawed), but
it's also used all over for the old-style dataflow analysis done
here.  We'd be seeing failing for those "corner cases" all
around delayed-branch targets if that didn't work.  I should be
able to use the existing machinery in this patch.

BTW, I happened to notice that bugs here are also somewhat more
visible than your ordinary wrong-result bug. :)

> Hans-Peter Nilsson via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > Originally I thought to bootstrap this patch on MIPS and SPARC
> > since they're both delayed-branch-slot targets but I
> > reconsidered, as neither is a TARGET_FLAGS_REGNUM target.  It
> > seems only visium and CRIS has this feature set, and I see no
> > trace of visium in neither newlib nor the simulator next to
> > glibc.  So, I just tested cris-elf.
> >
> > This handles TARGET_FLAGS_REGNUM clobbering insns as delay-slot
> > fillers using a method similar to that in commit 33c2207d3fda,
> > where care was taken for fill_simple_delay_slots to allow such
> > insns when scanning for delay-slot fillers *backwards* (before
> > the insn).
> >
> > A TARGET_FLAGS_REGNUM target is typically a former cc0 target.
> > For cc0 targets, insns don't mention clobbering cc0, so the
> > clobbers are mentioned in the "resources" only as a special
> > entity and only for compare-insns and branches, where the cc0
> > value matters.
> >
> > In contrast, with TARGET_FLAGS_REGNUM, most insns clobber it and
> > the register liveness detection in reorg.c / resource.c treats
> > that as a blocker (for other insns mentioning it, i.e. most)
> > when looking for delay-slot-filling candidates.  This means that
> > when comparing core and performance for a delay-slot cc0 target
> > before and after the de-cc0 conversion, the inability to fill a
> > delay slot after conversion manifests as a regression.  This was
> > one such case, for CRIS, with random_bitstring in
> > gcc.c-torture/execute/arith-rand-ll.c as well as the target
> > libgcc division function.
> >
> > After this, all known performance regressions compared to cc0
> > are fixed.
> >
> > Ok to commit?
> >
> > gcc:
> > 	PR target/93372
> > 	* reorg.c (fill_slots_from_thread): Allow trial insns that clobber
> > 	TARGET_FLAGS_REGNUM as delay-slot fillers.
> >
> > gcc/testsuite:
> > 	PR target/93372
> > 	* gcc.target/cris/pr93372-47.c: New test.
> > ---
> >  gcc/reorg.c                                | 37 +++++++++++++++++++++-
> >  gcc/testsuite/gcc.target/cris/pr93372-47.c | 49 ++++++++++++++++++++++++++++++
> >  2 files changed, 85 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.target/cris/pr93372-47.c
> >
> > diff --git a/gcc/reorg.c b/gcc/reorg.c
> > index dfd7494bf..83161caa0 100644
> > --- a/gcc/reorg.c
> > +++ b/gcc/reorg.c
> > @@ -2411,6 +2411,21 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx condition,
> >    CLEAR_RESOURCE (&needed);
> >    CLEAR_RESOURCE (&set);
> >  
> > +  /* Handle the flags register specially, to be able to accept a
> > +     candidate that clobbers it.  See also fill_simple_delay_slots.  */
> > +  bool filter_flags
> > +    = (slots_to_fill == 1
> > +       && targetm.flags_regnum != INVALID_REGNUM
> > +       && find_regno_note (insn, REG_DEAD, targetm.flags_regnum));
> > +  struct resources fset;
> > +  struct resources flags_res;
> > +  if (filter_flags)
> > +    {
> > +      CLEAR_RESOURCE (&fset);
> > +      CLEAR_RESOURCE (&flags_res);
> > +      SET_HARD_REG_BIT (flags_res.regs, targetm.flags_regnum);
> > +    }
> > +
> >    /* If we do not own this thread, we must stop as soon as we find
> >       something that we can't put in a delay slot, since all we can do
> >       is branch into THREAD at a later point.  Therefore, labels stop
> > @@ -2439,8 +2454,18 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx condition,
> >        /* If TRIAL conflicts with the insns ahead of it, we lose.  Also,
> >  	 don't separate or copy insns that set and use CC0.  */
> >        if (! insn_references_resource_p (trial, &set, true)
> > -	  && ! insn_sets_resource_p (trial, &set, true)
> > +	  && ! insn_sets_resource_p (trial, filter_flags ? &fset : &set, true)
> >  	  && ! insn_sets_resource_p (trial, &needed, true)
> > +	  /* If we're handling sets to the flags register specially, we
> > +	     only allow an insn into a delay-slot, if it either:
> > +	     - doesn't set the flags register,
> > +	     - the "set" of the flags register isn't used (clobbered),
> > +	     - insns between the delay-slot insn and the trial-insn
> > +	     as accounted in "set", have not affected the flags register.  */
> > +	  && (! filter_flags
> > +	      || ! insn_sets_resource_p (trial, &flags_res, true)
> > +	      || find_regno_note (trial, REG_UNUSED, targetm.flags_regnum)
> > +	      || ! TEST_HARD_REG_BIT (set.regs, targetm.flags_regnum))
> >  	  && (!HAVE_cc0 || (! (reg_mentioned_p (cc0_rtx, pat)
> >  			      && (! own_thread || ! sets_cc0_p (pat)))))
> >  	  && ! can_throw_internal (trial))
> > @@ -2618,6 +2643,16 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx condition,
> >        lose = 1;
> >        mark_set_resources (trial, &set, 0, MARK_SRC_DEST_CALL);
> >        mark_referenced_resources (trial, &needed, true);
> > +      if (filter_flags)
> > +	{
> > +	  mark_set_resources (trial, &fset, 0, MARK_SRC_DEST_CALL);
> > +
> > +	  /* Groups of flags-register setters with users should not
> > +	     affect opportunities to move flags-register-setting insns
> > +	     (clobbers) into the delay-slot.  */
> > +	  CLEAR_HARD_REG_BIT (needed.regs, targetm.flags_regnum);
> 
> If we do this, what stops us from trying to move a flags-register user
> ahead of the associated setter, when the user doesn't itself set the
> flags register?

First, all sets are still in set (set.regs) including any that
set the flags register between the insn with the delay-slot and
the trial.

(That's why it's separate from fset (fset.regs).  I used the
same naming scheme as in the named commit, but perhaps a better
name would be set_with_flags_filtered or something.)

>  Feels like there should be some test involving
> insn_references_resource_p (trial, &flags_res, true) in the
> (! filter_flags || ...) condition above.

There is: the "! insn_references_resource_p (trial, &set, true)"
(pre-existing in the context above the patch), so we don't move
anything that references anything set (only additional flags
register setters where the result is unused).

Ok now?

> Thanks,
> Richard

Thanks again for the review, much appreciated.

brgds, H-P

> > +	  CLEAR_HARD_REG_BIT (fset.regs, targetm.flags_regnum);
> > +	}
> >  
> >        /* Ensure we don't put insns between the setting of cc and the comparison
> >  	 by moving a setting of cc into an earlier delay slot since these insns
> > diff --git a/gcc/testsuite/gcc.target/cris/pr93372-47.c b/gcc/testsuite/gcc.target/cris/pr93372-47.c
> > new file mode 100644
> > index 000000000..8d80ae6b4
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/cris/pr93372-47.c
> > @@ -0,0 +1,49 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -march=v10" } */
> > +/* { dg-final { scan-assembler-times {\tnop} 1 } } */
> > +
> > +/* A somewhat brittle test-case, checking that we have (only) one
> > +   unfilled delay-slot in random_bitstring: there might be none or two
> > +   or more, and general improvements may lead to unfilled delay-slots.
> > +   When the scan-assembler-times directive regresses, re-run
> > +   gcc.c-torture/execute/arith-rand-ll.c, check cycle-level
> > +   execution-time regressions in random_bitstring and take appropriate
> > +   action.  */
> > +
> > +static long long
> > +simple_rand ()
> > +{
> > +  static unsigned long long seed = 47114711;
> > +  unsigned long long this = seed * 1103515245 + 12345;
> > +  seed = this;
> > +  return this >> 8;
> > +}
> > +
> > +unsigned long long int
> > +random_bitstring ()
> > +{
> > +  unsigned long long int x;
> > +  int n_bits;
> > +  long long ran;
> > +  int tot_bits = 0;
> > +
> > +  x = 0;
> > +  for (;;)
> > +    {
> > +      ran = simple_rand ();
> > +      n_bits = (ran >> 1) % 16;
> > +      tot_bits += n_bits;
> > +
> > +      if (n_bits == 0)
> > +	return x;
> > +      else
> > +	{
> > +	  x <<= n_bits;
> > +	  if (ran & 1)
> > +	    x |= (1 << n_bits) - 1;
> > +
> > +	  if (tot_bits > 8 * sizeof (long long) + 6)
> > +	    return x;
> > +	}
> > +    }
> > +}
> 


More information about the Gcc-patches mailing list