[PATCH] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]

Segher Boessenkool segher@kernel.crashing.org
Thu Jun 3 20:31:07 GMT 2021


On Thu, Jun 03, 2021 at 02:49:15PM +0800, Xionghu Luo wrote:
> If remove the rotate in simplify-rtx like below:
> 
> +++ b/gcc/simplify-rtx.c
> @@ -3830,10 +3830,16 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
>      case ROTATE:
>        if (trueop1 == CONST0_RTX (mode))
>         return op0;
> +
> +      if (GET_CODE (trueop0) == ROTATE && trueop1 == GEN_INT (64)
> +         && CONST_INT_P (XEXP (trueop0, 1))
> +         && INTVAL (XEXP (trueop0, 1)) == 64)
> +       return XEXP (trueop0, 0);

(The hardcoded 64 need improving -- but this is just a proof of concept
I'll assume :-) )

> Combine still fail to merge the two instructions:
> 
> Trying 6 -> 7:
>     6: r120:KF#0=r125:KF#0<-<0x40
>       REG_DEAD r125:KF
>     7: [sfp:DI+r123:DI]=r120:KF#0<-<0x40
>       REG_DEAD r120:KF
> Successfully matched this instruction:
> (set (mem/c:V1TI (plus:DI (reg/f:DI 110 sfp)
>             (reg:DI 123)) [1  S16 A128])
>     (subreg:V1TI (reg:KF 125) 0))
> rejecting combination of insns 6 and 7
> original costs 4 + 4 = 8
> replacement cost 12

So what instructions were these?  Why did the store cost 4 but the new
one costs 12?

> By hacking the vsx_le_perm_store_v1ti INSN_COST from 12 to 8,

It should be the same cost as the other store!

> it could merge the instructions:
> 
>     21: r125:KF=%v2:KF
>       REG_DEAD %v2:KF
>     2: NOTE_INSN_DELETED
>     3: NOTE_INSN_FUNCTION_BEG
>     6: NOTE_INSN_DELETED
>    17: r123:DI=0x20
>     7: [sfp:DI+r123:DI]=r125:KF#0
>       REG_DEAD r125:KF
>    19: NOTE_INSN_DELETED
>    14: %v2:V1TI=[sfp:DI+r123:DI]
>       REG_DEAD r123:DI
>    15: use %v2:V1TI
> 
> Then followed split1 pass will still split it to due to no dse pass
> between to remove the memory operations on stack, remove the rotate
> in swap won't face such problem since it runs before dse and no split
> pass between them:

Sure, but none of that is the point.  I asked if we did this for TImode
properly, and maybe we do, but:

>    22: r126:V1TI=r125:KF#0<-<0x40
>    23: [sfp:DI+r123:DI]=r126:V1TI<-<0x40

... this is V1TI mode.


Segher


More information about the Gcc-patches mailing list