[patch]: COMMITTED Enhance rtl-dse to remove unaligned read after writes.

Richard Sandiford rsandifo@nildram.co.uk
Sun Sep 16 09:24:00 GMT 2007


Kenneth Zadeck <zadeck@naturalbridge.com> writes:
> All changes were made as requested.
>
> Rebootstrapped and regression tested on x86-64, ia-64, and ppc-32.
>
> committed as revision 128481.

This is causing quite a few execution failures on 64-bit MIPS targets.
The immediate problem is that the patch uses:

    emit_move_insn (read_reg,  gen_lowpart (read_mode, new_reg));

to narrow new_reg to read_mode and:

    emit_move_insn (read_reg, gen_lowpart (read_mode, store_info->rhs));

to narrow store_info->rhs to read_mode.  (Both are always narrowing
operations.)  However, narrowing from one integer mode to another can need
real instructions on some targets, as controlled by TRULY_NOOP_TRUNCATION.
64-bit MIPS targets were the original use case: when subword values are
stored in registers, we want to preserve the invariant that the upper
32 bits are a sign-extension of the lower 32-bits[*].

You should generally use convert_move instead, as in:

    convert_move (read_reg, new_reg, <don't-care>);
    ...
    convert_move (read_reg, store_info->rhs, <don't-care>);

There's the question of whether we should take the truncation into account
when computing the cost of a shift on TRULY_NOOP_TRUNCATION targets.
I think the answer's no: we can often combine the truncation into either
the shift itself (if the shift is by >= 32 bits), or into the following
operation.

We should also skip access_sizes that are smaller than the store mode
if truncating the store mode to the access size needs real insns.
The patch below does this with:

      /* Try a wider mode if truncating the store mode to ACCESS_SIZE
	 bytes requires a real instruction.  */
      if (access_size < GET_MODE_SIZE (store_mode)
	  && !TRULY_NOOP_TRUNCATION (access_size * BITS_PER_UNIT,
				     GET_MODE_BITSIZE (store_mode)))
	continue;

(The use of "continue" is out of style with the loop's current control
flow.  For ease of review, I'll send a follow-up patch to fix that,
with no behavioural changes.)

I noticed a few other problems too:

  - The cost of a shift is calculated as:

	for (insn = shift_seq; insn != NULL_RTX; insn = NEXT_INSN (insn))
	  cost += insn_rtx_cost (insn);

    but insn_rtx_cost takes a pattern, not the insn itself.  Thus we
    always ended up with a zero cost.

  - Fixing that showed up a df-related problem.  If a call to
    expand_binop generates more than one instruction, expand_binop
    helpfully adds a REG_EQUAL note to the final instruction.
    set_unique_reg_note then passes this insn to df_notes_rescan,
    and we'll end up scanning the instruction even if we decide to
    discard it.  This leads to an ICE.

    I believe the right fix here is to set DF_NO_INSN_RESCAN
    while expanding the sequence.  If we decide to keep the
    instructions, we'll still pass them df_insn_rescan from
    emit_insn_before.  (This is of course how the scan is queued
    for insns without a REG_EQUAL note.)

    The infrastructure would probably be more robust if scans were
    only automatically scheduled for the topmost sequence.  However,
    that's much more invasive, and might need someone more familiar
    with the df code than I am.  I ask that the patch be reviewed
    purely as a DSE fix, not least because the problem above would
    break bootstrap on IRIX and 64-bit MIPS GNU/Linux.

  - The patch appears to be unintentionally restrictive.  The loop
    in find_shift_sequence is:

        while (access_size < UNITS_PER_WORD)

    so we only consider using subword shifts, whereas word shifts
    account for the most interesting cases on MIPS.

I've tried to fix these problems in the patch below.  However, I was
also a little uneasy about the use of GET_MODE_CLASS in the shift code.
Are we trying to cope with just MODE_INT, with both MODE_INT and
MODE_PARTIAL_INT, or with more than those two?  It looks like the
code would trigger for vector integer modes too, but a shift in
a vector mode is done elementwise, and isn't what we want here.
I haven't don't anything about that; just thought I'd mention it.

I added a testcase to make sure that the optimisation triggers for
all expected cases on 64-bit MIPS targets.  (Which it does, yay!)

Bootstrapped & regression-tested on x86_64-linux-gnu.  Also
regression-tested on mipsisa64-elfoabi, where it fixes a bunch
of execution failures.  OK to install?

Richard

[*] This has two advantages:

    - We can use 64-bit instructions to compare SImode values without
      converting them first.  (There are no separate 32-bit scc or
      bcc instructions.)

    - We can use 32-bit arithmetic instructions for subword operands.
      The 32-bit arithmetic instructions are unpredictable unless the
      upper 32 bits are a sign-extension of the lower 32 bits (and the
      instructions themselves preserve this extension on the result).


gcc/
	* dse.c (find_shift_sequence): Allow word as well as subword shifts.
	Do the tentative shift expansion with the DF_NO_INSN_RESCAN flag set.
	Fix the call to insn_rtx_cost.  Skip access sizes that require a
	no-op truncation of the store register.  Use convert_move instead
	of gen_lowpart when narrowing the result.
	(replace_read): Use convert_move instead of gen_lowpart when
	narrowing the store rhs.

gcc/testsuite/
	* gcc.target/mips/dse-1.c: New test.

Index: gcc/dse.c
===================================================================
--- gcc/dse.c	2007-09-15 19:56:07.000000000 +0100
+++ gcc/dse.c	2007-09-15 21:19:42.000000000 +0100
@@ -1407,21 +1407,31 @@ find_shift_sequence (rtx read_reg,
      justify the value we want to read but is available in one insn on
      the machine.  */
 
-  while (access_size < UNITS_PER_WORD)
+  for (; access_size <= UNITS_PER_WORD; access_size *= 2)
     {
-      rtx target;
-      enum machine_mode new_mode
-	= smallest_mode_for_size (access_size * BITS_PER_UNIT,
-				  GET_MODE_CLASS (read_mode));
-      rtx new_reg = gen_reg_rtx (new_mode);
+      rtx target, new_reg;
+      enum machine_mode new_mode;
+
+      /* Try a wider mode if truncating the store mode to ACCESS_SIZE
+	 bytes requires a real instruction.  */
+      if (access_size < GET_MODE_SIZE (store_mode)
+	  && !TRULY_NOOP_TRUNCATION (access_size * BITS_PER_UNIT,
+				     GET_MODE_BITSIZE (store_mode)))
+	continue;
+
+      new_mode = smallest_mode_for_size (access_size * BITS_PER_UNIT,
+					 GET_MODE_CLASS (read_mode));
+      new_reg = gen_reg_rtx (new_mode);
 
       start_sequence ();
 
       /* In theory we could also check for an ashr.  Ian Taylor knows
 	 of one dsp where the cost of these two was not the same.  But
 	 this really is a rare case anyway.  */
+      df_set_flags (DF_NO_INSN_RESCAN);
       target = expand_binop (new_mode, lshr_optab, new_reg,
 			     GEN_INT (shift), new_reg, 1, OPTAB_DIRECT);
+      df_clear_flags (DF_NO_INSN_RESCAN);
 
       if (target == new_reg)
 	{
@@ -1436,7 +1446,8 @@ find_shift_sequence (rtx read_reg,
 	      rtx insn;
 
 	      for (insn = shift_seq; insn != NULL_RTX; insn = NEXT_INSN (insn))
-		cost += insn_rtx_cost (insn);
+		if (INSN_P (insn))
+		  cost += insn_rtx_cost (PATTERN (insn));
 
 	      /* The computation up to here is essentially independent
 		 of the arguments and could be precomputed.  It may
@@ -1455,7 +1466,7 @@ find_shift_sequence (rtx read_reg,
 		  start_sequence ();
 		  emit_move_insn (new_reg, gen_lowpart (new_mode, store_info->rhs));
 		  emit_insn (shift_seq);
-		  emit_move_insn (read_reg,  gen_lowpart (read_mode, new_reg));
+		  convert_move (read_reg, new_reg, 1);
 		  
 		  if (dump_file)
 		    {
@@ -1480,8 +1491,6 @@ find_shift_sequence (rtx read_reg,
       else
 	/* End the sequence.  */
 	end_sequence ();
-
-      access_size = access_size * 2;
     }
 
   return NULL;
@@ -1595,7 +1604,7 @@ replace_read (store_info_t store_info, i
 	     place, we need to extract the value in the right from the
 	     rhs of the store.  */
 	  start_sequence ();
-	  emit_move_insn (read_reg, gen_lowpart (read_mode, store_info->rhs));
+	  convert_move (read_reg, store_info->rhs, 1);
 	  
 	  if (dump_file)
 	    fprintf (dump_file, " -- adding extract insn r%d:%s = r%d:%s\n",
Index: gcc/testsuite/gcc.target/mips/dse-1.c
===================================================================
--- /dev/null	2007-09-15 10:15:35.552097000 +0100
+++ gcc/testsuite/gcc.target/mips/dse-1.c	2007-09-15 19:56:09.000000000 +0100
@@ -0,0 +1,36 @@
+/* { dg-mips-options "-mgp64 -O" } */
+
+#define TEST(ID, TYPE1, TYPE2)					\
+  union {							\
+    TYPE1 m1[sizeof (TYPE2) / sizeof (TYPE1)];			\
+    TYPE2 m2;							\
+  } u##ID;							\
+								\
+  /* The MIPS16 versions of the shifts we need are too		\
+     expensive.  */						\
+  TYPE1 __attribute__((nomips16))				\
+  f##ID (TYPE2 x)						\
+  {								\
+    u##ID.m2 = x;						\
+    return (u##ID.m1[0]						\
+	    + u##ID.m1[sizeof (TYPE2) / sizeof (TYPE1) - 1]);	\
+  }
+
+TEST (1, unsigned int, unsigned long long);
+TEST (2, int, long long);
+TEST (3, unsigned short, unsigned long long);
+TEST (4, short, long long);
+TEST (5, unsigned char, unsigned long long);
+TEST (6, signed char, long long);
+
+TEST (7, unsigned short, unsigned int);
+TEST (8, short, int);
+TEST (9, unsigned char, unsigned int);
+TEST (10, signed char, int);
+
+/* DSE isn't yet read to consider stores of subregs, so the corresponding
+   (char, short) tests won't pass.  */
+
+/* { dg-final { scan-assembler-not "\tlh\t" } } */
+/* { dg-final { scan-assembler-not "\tlw\t" } } */
+/* { dg-final { scan-assembler-not "\tlb\t" } } */



More information about the Gcc-patches mailing list