[PATCH] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]

Xionghu Luo luoxhu@linux.ibm.com
Thu Jun 3 06:49:15 GMT 2021



On 2021/6/3 08:46, Xionghu Luo via Gcc-patches wrote:
> Hi,
> 
> On 2021/6/3 06:20, Segher Boessenkool wrote:
>> On Wed, Jun 02, 2021 at 03:19:32AM -0500, Xionghu Luo wrote:
>>> On P8LE, extra rot64+rot64 load or store instructions are generated
>>> in float128 to vector __int128 conversion.
>>>
>>> This patch teaches pass swaps to also handle such pattens to remove
>>> extra swap instructions.
>>
>> Did you check if this is already handled by simplify-rtx if the mode had
>> been TImode (not V1TImode)?  If not, why do you not handle it there?
> 
> I tried to do it in combine or peephole, the later pass split2
> or split3 will still split it to rotate + rotate again as we have split
> after reload, and this pattern is quite P8LE specific, so put it in pass
> swap.  The simplify-rtx could simplify
> r124:KF#0=r123:KF#0<-<0x40<-<0x40 to r124:KF#0=r123:KF#0 for register
> operations already.
> 

And, forgot to mention, after swap pass removes the rotate with this patch,
dse1 pass followed could remove the stack operation, which avoid to split
to rotate load/store again in later passes.

If remove the rotate in simplify-rtx like below:

+++ b/gcc/simplify-rtx.c
@@ -3830,10 +3830,16 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
     case ROTATE:
       if (trueop1 == CONST0_RTX (mode))
        return op0;
+
+      if (GET_CODE (trueop0) == ROTATE && trueop1 == GEN_INT (64)
+         && CONST_INT_P (XEXP (trueop0, 1))
+         && INTVAL (XEXP (trueop0, 1)) == 64)
+       return XEXP (trueop0, 0);

Combine still fail to merge the two instructions:

Trying 6 -> 7:
    6: r120:KF#0=r125:KF#0<-<0x40
      REG_DEAD r125:KF
    7: [sfp:DI+r123:DI]=r120:KF#0<-<0x40
      REG_DEAD r120:KF
Successfully matched this instruction:
(set (mem/c:V1TI (plus:DI (reg/f:DI 110 sfp)
            (reg:DI 123)) [1  S16 A128])
    (subreg:V1TI (reg:KF 125) 0))
rejecting combination of insns 6 and 7
original costs 4 + 4 = 8
replacement cost 12

By hacking the vsx_le_perm_store_v1ti INSN_COST from 12 to 8,
it could merge the instructions:

    21: r125:KF=%v2:KF
      REG_DEAD %v2:KF
    2: NOTE_INSN_DELETED
    3: NOTE_INSN_FUNCTION_BEG
    6: NOTE_INSN_DELETED
   17: r123:DI=0x20
    7: [sfp:DI+r123:DI]=r125:KF#0
      REG_DEAD r125:KF
   19: NOTE_INSN_DELETED
   14: %v2:V1TI=[sfp:DI+r123:DI]
      REG_DEAD r123:DI
   15: use %v2:V1TI

Then followed split1 pass will still split it to due to no dse pass
between to remove the memory operations on stack, remove the rotate
in swap won't face such problem since it runs before dse and no split
pass between them:

   21: r125:KF=%v2:KF
      REG_DEAD %v2:KF
    2: NOTE_INSN_DELETED
    3: NOTE_INSN_FUNCTION_BEG
    6: NOTE_INSN_DELETED
   17: r123:DI=0x20
   22: r126:V1TI=r125:KF#0<-<0x40
   23: [sfp:DI+r123:DI]=r126:V1TI<-<0x40
   19: NOTE_INSN_DELETED
   24: r127:V1TI=[sfp:DI+r123:DI]<-<0x40
   25: %v2:V1TI=r127:V1TI<-<0x40
   15: use %v2:V1TI

Thanks,
Xionghu





More information about the Gcc-patches mailing list