[PATCH PR95254] aarch64: gcc generate inefficient code with fixed sve vector length

Richard Sandiford richard.sandiford@arm.com
Wed May 27 16:06:57 GMT 2020


"Yangfei (Felix)" <felix.yang@huawei.com> writes:
>> > +
>> > +    {
>> > +      x = x_inner;
>> > +    }
>> > +  else if (x_inner != NULL_RTX && MEM_P (y)
>> > +	   && known_eq (GET_MODE_SIZE (x_inner_mode),
>> GET_MODE_SIZE (mode))
>> > +	   && ! targetm.can_change_mode_class (x_inner_mode, mode,
>> ALL_REGS)
>> > +	   && (! targetm.slow_unaligned_access (x_inner_mode,
>> MEM_ALIGN (y))
>> > +	       || MEM_ALIGN (y) >= GET_MODE_ALIGNMENT
>> (x_inner_mode)))
>> 
>> What is the last condition protecting against?  Seems worth a comment.
>
> Comment added.  Here I am intended to avoid generating a slow unaligned memory access.
> Machine modes like VNx2HImode may have an small alignment than modes like V4HI.
> For the given test case, SLP forces the alignment of memory access of mode VNx2HImode to be 32 bytes.
> In theory, we may have other cases where alignment of innermode is bigger than that of the outermode.

Ah, OK.  But in that case, shouldn't we allow the change if the
original unaligned MEM was also “slow”?

I guess there might be cases in which both modes are slow enough
for the hook to return true for them, but one is worse than the other.
But I don't think there's much we can do about that as things stand:
changing the mode might move from a slow mode to a slower mode,
but it might move in the other direction too.

> +2020-05-27  Felix Yang  <felix.yang@huawei.com>
> +           Richard Sandiford  <richard.sandiford@arm.com>

I appreciate the gesture, but I don't think it's appropriate
to list me as an author.  I haven't written any of the code,
I've just reviewed it. :-)

> diff --git a/gcc/expr.c b/gcc/expr.c
> index dfbeae71518..3035791c764 100644
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -3814,6 +3814,69 @@ emit_move_insn (rtx x, rtx y)
>    gcc_assert (mode != BLKmode
>  	      && (GET_MODE (y) == mode || GET_MODE (y) == VOIDmode));
>  
> +  /* If we have a copy which looks like one of the following patterns:

s/which/that/ (....I think)

> +       (set (subreg:M1 (reg:M2 ...)) (subreg:M1 (reg:M2 ...)))
> +       (set (subreg:M1 (reg:M2 ...)) (mem:M1 ADDR))
> +       (set (mem:M1 ADDR) (subreg:M1 (reg:M2 ...)))
> +       (set (subreg:M1 (reg:M2 ...)) (constant C))
> +     where mode M1 is equal in size to M2 and target hook can_change_mode_class
> +     (M1, M2, ALL_REGS) returns false, try to remove the subreg.  This avoids
> +     an implicit round trip through memory.  */

How about:

     where mode M1 is equal in size to M2, try to detect whether the
     mode change involves an implicit round trip through memory.
     If so, see if we can avoid that by removing the subregs and
     doing the move in mode M2 instead.  */

> +  else if (x_inner != NULL_RTX
> +	   && MEM_P (y)
> +	   && ! targetm.can_change_mode_class (GET_MODE (x_inner),
> +					       mode, ALL_REGS)
> +	   /* Stop if the inner mode requires too much alignment.  */
> +	   && (! targetm.slow_unaligned_access (GET_MODE (x_inner),
> +						MEM_ALIGN (y))
> +	       || MEM_ALIGN (y) >= GET_MODE_ALIGNMENT (GET_MODE (x_inner))))

It's better to check the alignment first, since it's cheaper.
So taking the comment above into account, I think this ends up as:

	   && (MEM_ALIGN (y) >= GET_MODE_ALIGNMENT (GET_MODE (x_inner))
	       || targetm.slow_unaligned_access (mode, MEM_ALIGN (y)
	       || !targetm.slow_unaligned_access (GET_MODE (x_inner),
						  MEM_ALIGN (y))

(Note: no space after "!", although the sources aren't as consistent
about that as they could be.)

TBH I think it would be good to avoid duplicating such a complicated
condition in both directions, so at the risk of getting flamed, how
about using a lambda?

  auto candidate_mem_p = [&](machine_mode inner_mode, rtx mem) {
    return ...;
  };

with ... containing everything after the MEM_P check?

Looks good otherwise, thanks,

Richard


More information about the Gcc-patches mailing list