[PATCH] lra: Canonicalize mult to shift in address reloads

Richard Sandiford richard.sandiford@arm.com
Wed Aug 26 09:06:28 GMT 2020


Alex Coplan <alex.coplan@arm.com> writes:
> Hello,
>
> Inside a (mem) RTX, it is canonical to write multiplications by powers
> of two using a (mult) [0]. For example, given the following C function:
>
> long f(long *p, long x)
> {
>     return p[x];
> }
>
> AArch64 GCC generates the following RTL insn (in final):
>
> (set (reg/i:DI 0 x0)
>      (mem:DI (plus:DI (mult:DI (reg:DI 1 x1 [101])
>                  (const_int 8 [0x8]))
>              (reg:DI 0 x0 [100])) [1 *_3+0 S8 A64]))
>
> However, outside of a (mem), the canonical way to write multiplications
> by powers of two is using (ashift). E.g. for the following C function:
>
> long *g(long *p, long x)
> {
>     return p + x;
> }
>
> AArch64 GCC generates:
>
> (set (reg/i:DI 0 x0)
>      (plus:DI (ashift:DI (reg:DI 1 x1 [100])
>              (const_int 3 [0x3]))
>          (reg:DI 0 x0 [99])))
>
> Now I observed that LRA does not quite respect this RTL canonicalization
> rule. When compiling gcc/testsuite/gcc.dg/torture/pr34330.c with -Os
> -ftree-vectorize, the RTL in the dump "281r.ira" has the insn:
>
> (set (reg:SI 111 [ MEM[base: b_25(D), index: ivtmp.9_35, step: 4, offset: 0B] ])
>      (mem:SI (plus:DI (mult:DI (reg:DI 101 [ ivtmp.9 ])
>                  (const_int 4 [0x4]))
>              (reg/v/f:DI 105 [ b ])) [2 MEM[base: b_25(D), index: ivtmp.9_35, step: 4, offset: 0B]+0 S4 A32]))
>
> but LRA then proceeds to generate a reload, and we get the following
> non-canonical insn in "282r.reload":
>
> (set (reg:DI 7 x7 [121])
>      (plus:DI (mult:DI (reg:DI 5 x5 [orig:101 ivtmp.9 ] [101])
>              (const_int 4 [0x4]))
>          (reg/v/f:DI 1 x1 [orig:105 b ] [105])))
>
> This patch fixes LRA to ensure that we generate canonical RTL in this
> case. After the patch, we get the following insn in "282r.reload":
>
> (set (reg:DI 7 x7 [121])
>         (plus:DI (ashift:DI (reg:DI 5 x5 [orig:101 ivtmp.9 ] [101])
>                 (const_int 2 [0x2]))
>             (reg/v/f:DI 1 x1 [orig:105 b ] [105])))
>
> The motivation here is to be able to remove several redundant patterns
> in the AArch64 backend. See the previous thread [1] for context.
>
> Testing:
>  * Bootstrapped and regtested on aarch64-none-linux-gnu,
>    x86_64-pc-linux-gnu.
>  * New unit test which checks that we're using the shift pattern (rather
>    than the problematic mult pattern) on AArch64.
>
> OK for master?
>
> Thanks,
> Alex
>
> [0] : https://gcc.gnu.org/onlinedocs/gccint/Insn-Canonicalizations.html
> [1] : https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552066.html

Thanks for doing this.

>
> ---
>
> gcc/ChangeLog:
>
> 	* lra-constraints.c (canonicalize_reload_addr): New.
> 	(curr_insn_transform): Use canonicalize_reload_addr to ensure we
> 	generate canonical RTL for an address reload.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/aarch64/mem-shift-canonical.c: New test.
>
> diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
> index 421c453997b..630a4f167cd 100644
> --- a/gcc/lra-constraints.c
> +++ b/gcc/lra-constraints.c
> @@ -570,6 +570,34 @@ init_curr_insn_input_reloads (void)
>    curr_insn_input_reloads_num = 0;
>  }
>  
> +/* The canonical form of an rtx inside a MEM is not necessarily the same as the
> +   canonical form of the rtx outside the MEM.  Fix this up in the case that
> +   we're reloading an address (and therefore pulling it outside a MEM).  */
> +static rtx canonicalize_reload_addr (rtx addr)

Minor nit, should be formatted as:

static rtx
canonicalize_reload_addr (rtx addr)

> +{
> +  const enum rtx_code code = GET_CODE (addr);
> +
> +  if (code == PLUS)
> +    {
> +      rtx inner = XEXP (addr, 0);
> +      if (GET_CODE (inner) == MULT && CONST_INT_P (XEXP (inner, 1)))
> +	{
> +	  const HOST_WIDE_INT ci = INTVAL (XEXP (inner, 1));
> +	  const int pwr2 = exact_log2 (ci);
> +	  if (pwr2 > 0)
> +	    {
> +	      /* Rewrite this to use a shift instead, which is canonical
> +		 when outside of a MEM.  */
> +	      XEXP (addr, 0) = gen_rtx_ASHIFT (GET_MODE (inner),
> +					       XEXP (inner, 0),
> +					       GEN_INT (pwr2));
> +	    }
> +	}
> +    }

I don't think we should we restrict this to (plus (mult X Y) Z),
since addresses can be more complicated than that.  One way to
search for all MULTs is:

  subrtx_iterator::array_type array;
  FOR_EACH_SUBRTX (iter, array, x, NONCONST)
    {
      rtx x = *iter;
      if (GET_CODE (x) == MULT && CONST_INT_P (XEXP (x, 1)))
        ...
    }

(Needs rtl-iter.h)

Thanks,
Richard


More information about the Gcc-patches mailing list