[patch]: COMMITTED Enhance rtl-dse to remove unaligned read after writes.

Kenneth Zadeck zadeck@naturalbridge.com
Mon Sep 17 09:06:00 GMT 2007


Richard Sandiford wrote:
> Kenneth Zadeck <zadeck@naturalbridge.com> writes:
>   
>> All changes were made as requested.
>>
>> Rebootstrapped and regression tested on x86-64, ia-64, and ppc-32.
>>
>> committed as revision 128481.
>>     
>
>   
I am certainly happy that the my 35k line df commit had several orders
of magnitude few problems per line as this patch did or it would have
never gone in.  On the other hand, most of that patch was with parts of
the compiler that i was more familiar with.  Many of the things you
mention here, i have simply never heard.

> This is causing quite a few execution failures on 64-bit MIPS targets.
> The immediate problem is that the patch uses:
>
>     emit_move_insn (read_reg,  gen_lowpart (read_mode, new_reg));
>
> to narrow new_reg to read_mode and:
>
>     emit_move_insn (read_reg, gen_lowpart (read_mode, store_info->rhs));
>
> to narrow store_info->rhs to read_mode.  (Both are always narrowing
> operations.)  However, narrowing from one integer mode to another can need
> real instructions on some targets, as controlled by TRULY_NOOP_TRUNCATION.
> 64-bit MIPS targets were the original use case: when subword values are
> stored in registers, we want to preserve the invariant that the upper
> 32 bits are a sign-extension of the lower 32-bits[*].
>
> You should generally use convert_move instead, as in:
>
>     convert_move (read_reg, new_reg, <don't-care>);
>     ...
>     convert_move (read_reg, store_info->rhs, <don't-care>);
>
> There's the question of whether we should take the truncation into account
> when computing the cost of a shift on TRULY_NOOP_TRUNCATION targets.
> I think the answer's no: we can often combine the truncation into either
> the shift itself (if the shift is by >= 32 bits), or into the following
> operation.
>
> We should also skip access_sizes that are smaller than the store mode
> if truncating the store mode to the access size needs real insns.
> The patch below does this with:
>
>       /* Try a wider mode if truncating the store mode to ACCESS_SIZE
> 	 bytes requires a real instruction.  */
>       if (access_size < GET_MODE_SIZE (store_mode)
> 	  && !TRULY_NOOP_TRUNCATION (access_size * BITS_PER_UNIT,
> 				     GET_MODE_BITSIZE (store_mode)))
> 	continue;
>   
Given that store_mode is loop invariant, i would have written this code
differently.
I would have added code before the loop like

if (!TRULY_NOOP_TRUNCATION (access_size * BITS_PER_UNIT,

				     GET_MODE_BITSIZE (store_mode)))
	while (access_size < GET_MODE_SIZE store_mode)
	  access_size *= 2;


> (The use of "continue" is out of style with the loop's current control
> flow.  For ease of review, I'll send a follow-up patch to fix that,
> with no behavioural changes.)
>
> I noticed a few other problems too:
>
>   - The cost of a shift is calculated as:
>
> 	for (insn = shift_seq; insn != NULL_RTX; insn = NEXT_INSN (insn))
> 	  cost += insn_rtx_cost (insn);
>
>     but insn_rtx_cost takes a pattern, not the insn itself.  Thus we
>     always ended up with a zero cost.
>
>   - Fixing that showed up a df-related problem.  If a call to
>     expand_binop generates more than one instruction, expand_binop
>     helpfully adds a REG_EQUAL note to the final instruction.
>     set_unique_reg_note then passes this insn to df_notes_rescan,
>     and we'll end up scanning the instruction even if we decide to
>     discard it.  This leads to an ICE.
>
>     I believe the right fix here is to set DF_NO_INSN_RESCAN
>     while expanding the sequence.  If we decide to keep the
>     instructions, we'll still pass them df_insn_rescan from
>     emit_insn_before.  (This is of course how the scan is queued
>     for insns without a REG_EQUAL note.)
>
>   
>     The infrastructure would probably be more robust if scans were
>     only automatically scheduled for the topmost sequence.  However,
>   
This may be hard because there is no 100% reliable way to tell if an
insn is IN the function or whether you just created it and are playing
air guitar with it.  In df_insn_rescan, we check that the insn has a
basic block to before we queue the insn to be rescanned.

I think that the proper fix is to copy that check for the bb into
df_notes_rescan.  Since the insns have not been inserted yet, they do
not have a bb.

I would prefer this fix to yours, assuming it works.

This is the only df part of your patch, and as such is the only part
that i can approve.
it will take a middle end maintainer to approve the other parts.

Kenny
>     that's much more invasive, and might need someone more familiar
>     with the df code than I am.  I ask that the patch be reviewed
>     purely as a DSE fix, not least because the problem above would
>     break bootstrap on IRIX and 64-bit MIPS GNU/Linux.
>
>   

>   - The patch appears to be unintentionally restrictive.  The loop
>     in find_shift_sequence is:
>
>         while (access_size < UNITS_PER_WORD)
>
>     so we only consider using subword shifts, whereas word shifts
>     account for the most interesting cases on MIPS.
>
>   
Since i started playing with the ra conflict builder, i have gained a
little more insight into how subregs work.  you are right here.

> I've tried to fix these problems in the patch below.  However, I was
> also a little uneasy about the use of GET_MODE_CLASS in the shift code.
> Are we trying to cope with just MODE_INT, with both MODE_INT and
> MODE_PARTIAL_INT, or with more than those two?  It looks like the
> code would trigger for vector integer modes too, but a shift in
> a vector mode is done elementwise, and isn't what we want here.
> I haven't don't anything about that; just thought I'd mention it.
>
> I added a testcase to make sure that the optimisation triggers for
> all expected cases on 64-bit MIPS targets.  (Which it does, yay!)
>
> Bootstrapped & regression-tested on x86_64-linux-gnu.  Also
> regression-tested on mipsisa64-elfoabi, where it fixes a bunch
> of execution failures.  OK to install?
>
> Richard
>
> [*] This has two advantages:
>
>     - We can use 64-bit instructions to compare SImode values without
>       converting them first.  (There are no separate 32-bit scc or
>       bcc instructions.)
>
>     - We can use 32-bit arithmetic instructions for subword operands.
>       The 32-bit arithmetic instructions are unpredictable unless the
>       upper 32 bits are a sign-extension of the lower 32 bits (and the
>       instructions themselves preserve this extension on the result).
>
>
> gcc/
> 	* dse.c (find_shift_sequence): Allow word as well as subword shifts.
> 	Do the tentative shift expansion with the DF_NO_INSN_RESCAN flag set.
> 	Fix the call to insn_rtx_cost.  Skip access sizes that require a
> 	no-op truncation of the store register.  Use convert_move instead
> 	of gen_lowpart when narrowing the result.
> 	(replace_read): Use convert_move instead of gen_lowpart when
> 	narrowing the store rhs.
>
> gcc/testsuite/
> 	* gcc.target/mips/dse-1.c: New test.
>
> Index: gcc/dse.c
> ===================================================================
> --- gcc/dse.c	2007-09-15 19:56:07.000000000 +0100
> +++ gcc/dse.c	2007-09-15 21:19:42.000000000 +0100
> @@ -1407,21 +1407,31 @@ find_shift_sequence (rtx read_reg,
>       justify the value we want to read but is available in one insn on
>       the machine.  */
>  
> -  while (access_size < UNITS_PER_WORD)
> +  for (; access_size <= UNITS_PER_WORD; access_size *= 2)
>      {
> -      rtx target;
> -      enum machine_mode new_mode
> -	= smallest_mode_for_size (access_size * BITS_PER_UNIT,
> -				  GET_MODE_CLASS (read_mode));
> -      rtx new_reg = gen_reg_rtx (new_mode);
> +      rtx target, new_reg;
> +      enum machine_mode new_mode;
> +
> +      /* Try a wider mode if truncating the store mode to ACCESS_SIZE
> +	 bytes requires a real instruction.  */
> +      if (access_size < GET_MODE_SIZE (store_mode)
> +	  && !TRULY_NOOP_TRUNCATION (access_size * BITS_PER_UNIT,
> +				     GET_MODE_BITSIZE (store_mode)))
> +	continue;
> +
> +      new_mode = smallest_mode_for_size (access_size * BITS_PER_UNIT,
> +					 GET_MODE_CLASS (read_mode));
> +      new_reg = gen_reg_rtx (new_mode);
>  
>        start_sequence ();
>  
>        /* In theory we could also check for an ashr.  Ian Taylor knows
>  	 of one dsp where the cost of these two was not the same.  But
>  	 this really is a rare case anyway.  */
> +      df_set_flags (DF_NO_INSN_RESCAN);
>        target = expand_binop (new_mode, lshr_optab, new_reg,
>  			     GEN_INT (shift), new_reg, 1, OPTAB_DIRECT);
> +      df_clear_flags (DF_NO_INSN_RESCAN);
>  
>        if (target == new_reg)
>  	{
> @@ -1436,7 +1446,8 @@ find_shift_sequence (rtx read_reg,
>  	      rtx insn;
>  
>  	      for (insn = shift_seq; insn != NULL_RTX; insn = NEXT_INSN (insn))
> -		cost += insn_rtx_cost (insn);
> +		if (INSN_P (insn))
> +		  cost += insn_rtx_cost (PATTERN (insn));
>  
>  	      /* The computation up to here is essentially independent
>  		 of the arguments and could be precomputed.  It may
> @@ -1455,7 +1466,7 @@ find_shift_sequence (rtx read_reg,
>  		  start_sequence ();
>  		  emit_move_insn (new_reg, gen_lowpart (new_mode, store_info->rhs));
>  		  emit_insn (shift_seq);
> -		  emit_move_insn (read_reg,  gen_lowpart (read_mode, new_reg));
> +		  convert_move (read_reg, new_reg, 1);
>  		  
>  		  if (dump_file)
>  		    {
> @@ -1480,8 +1491,6 @@ find_shift_sequence (rtx read_reg,
>        else
>  	/* End the sequence.  */
>  	end_sequence ();
> -
> -      access_size = access_size * 2;
>      }
>  
>    return NULL;
> @@ -1595,7 +1604,7 @@ replace_read (store_info_t store_info, i
>  	     place, we need to extract the value in the right from the
>  	     rhs of the store.  */
>  	  start_sequence ();
> -	  emit_move_insn (read_reg, gen_lowpart (read_mode, store_info->rhs));
> +	  convert_move (read_reg, store_info->rhs, 1);
>  	  
>  	  if (dump_file)
>  	    fprintf (dump_file, " -- adding extract insn r%d:%s = r%d:%s\n",
> Index: gcc/testsuite/gcc.target/mips/dse-1.c
> ===================================================================
> --- /dev/null	2007-09-15 10:15:35.552097000 +0100
> +++ gcc/testsuite/gcc.target/mips/dse-1.c	2007-09-15 19:56:09.000000000 +0100
> @@ -0,0 +1,36 @@
> +/* { dg-mips-options "-mgp64 -O" } */
> +
> +#define TEST(ID, TYPE1, TYPE2)					\
> +  union {							\
> +    TYPE1 m1[sizeof (TYPE2) / sizeof (TYPE1)];			\
> +    TYPE2 m2;							\
> +  } u##ID;							\
> +								\
> +  /* The MIPS16 versions of the shifts we need are too		\
> +     expensive.  */						\
> +  TYPE1 __attribute__((nomips16))				\
> +  f##ID (TYPE2 x)						\
> +  {								\
> +    u##ID.m2 = x;						\
> +    return (u##ID.m1[0]						\
> +	    + u##ID.m1[sizeof (TYPE2) / sizeof (TYPE1) - 1]);	\
> +  }
> +
> +TEST (1, unsigned int, unsigned long long);
> +TEST (2, int, long long);
> +TEST (3, unsigned short, unsigned long long);
> +TEST (4, short, long long);
> +TEST (5, unsigned char, unsigned long long);
> +TEST (6, signed char, long long);
> +
> +TEST (7, unsigned short, unsigned int);
> +TEST (8, short, int);
> +TEST (9, unsigned char, unsigned int);
> +TEST (10, signed char, int);
> +
> +/* DSE isn't yet read to consider stores of subregs, so the corresponding
> +   (char, short) tests won't pass.  */
> +
> +/* { dg-final { scan-assembler-not "\tlh\t" } } */
> +/* { dg-final { scan-assembler-not "\tlw\t" } } */
> +/* { dg-final { scan-assembler-not "\tlb\t" } } */
>   



More information about the Gcc-patches mailing list