[PATCH] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]

Xionghu Luo luoxhu@linux.ibm.com
Thu Jun 10 07:11:08 GMT 2021



On 2021/6/10 00:24, Segher Boessenkool wrote:
> On Wed, Jun 09, 2021 at 11:20:20AM +0800, Xionghu Luo wrote:
>> On 2021/6/9 04:11, Segher Boessenkool wrote:
>>> On Fri, Jun 04, 2021 at 09:40:58AM +0800, Xionghu Luo wrote:
>>>>>> rejecting combination of insns 6 and 7
>>>>>> original costs 4 + 4 = 8
>>>>>> replacement cost 12
>>>>>
>>>>> So what instructions were these?  Why did the store cost 4 but the new
>>>>> one costs 12?
>>>
>>> The *vsx_le_perm_store_<mode> instruction has the *preferred*
>>> alternative with cost 12, while the other alternative has cost 8.  Why
>>> is that?  That looks like a bug.
>>>      (set_attr "length" "12,8")
>>
>> 12 was introduced by Mike's commit c477a6674364(r6-2577), and all the 5
>> vsx_le_perm_store_<mode> are set to 12 for modes VSX_D/VSX_W/V8HI/V16QI
>> /VSX_LE_128, I guess it is split to two rs6000_emit_le_vsx_permute before
>> reload, but 3 rs6000_emit_le_vsx_permute after reload, so the length is
>> 12, then it seems also not reasonable to change it from 12 to 8?  And I am
>> not sure when the alternative 1 will be chosen?
> 
> This is the instruction *length*, not the cost directly.  The length
> has to be correct, not lower than it will turn out to be that is, or on
> some big testcases you will get branches that cannot reach their target,
> and the resulting ICEs.
> 
> Alternatives are chosen by register allocation.  Before register
> allocation attributes are taken as if alternative 0 is selected (well,
> the first enabled alternative is selected, same thing here).
> 
> Which alternative is the expected (or wanted) one?  Either put that one
> first, or if it is the longer one, give it an explicit cost.
> 
>> ;; The post-reload split requires that we re-permute the source
>> ;; register in case it is still live.
>> (define_split
>>    [(set (match_operand:VSX_LE_128 0 "memory_operand")
>>          (match_operand:VSX_LE_128 1 "vsx_register_operand"))]
>>    "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed && !TARGET_P9_VECTOR
>>     && !altivec_indexed_or_indirect_operand (operands[0], <MODE>mode)"
>>    [(const_int 0)]
>> {
>>    rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode);
>>    rs6000_emit_le_vsx_permute (operands[0], operands[1], <MODE>mode);
>>    rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode);
>>    DONE;
>> })
> 
> So it seems like it is only 3 insns in the very unlucky case?  Normally
> it will end up as just one simple store?

I am afraid there is not "simple store" for *TImode on P8LE*.  There is only
stxvd2x that rotates the element(stvx requires memory to be aligned, not
suitable pattern), so every vsx_le_perm_store_v1ti must be split to 3
instructions for alternative 0, it seems incorrect to force the cost to be 4.


  So you want an explicit cost
> here then:
> 
>    ;; What is the insn_cost for this insn?  The target hook can still override
>    ;; this.  For optimizing for size the "length" attribute is used instead.
>    (define_attr "cost" "" (const_int 0))
> 
> So you would use something like
> 
>       (set_attr "cost" "4,*")
> 
> here (if I got that right, please check :-) )
> 
> HtH,
> 
> 
> Segher
> 

-- 
Thanks,
Xionghu


More information about the Gcc-patches mailing list