[PATCH] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]

Segher Boessenkool segher@kernel.crashing.org
Wed Jun 9 16:24:20 GMT 2021


On Wed, Jun 09, 2021 at 11:20:20AM +0800, Xionghu Luo wrote:
> On 2021/6/9 04:11, Segher Boessenkool wrote:
> > On Fri, Jun 04, 2021 at 09:40:58AM +0800, Xionghu Luo wrote:
> >>>> rejecting combination of insns 6 and 7
> >>>> original costs 4 + 4 = 8
> >>>> replacement cost 12
> >>>
> >>> So what instructions were these?  Why did the store cost 4 but the new
> >>> one costs 12?
> > 
> > The *vsx_le_perm_store_<mode> instruction has the *preferred*
> > alternative with cost 12, while the other alternative has cost 8.  Why
> > is that?  That looks like a bug.
> >     (set_attr "length" "12,8")
> 
> 12 was introduced by Mike's commit c477a6674364(r6-2577), and all the 5
> vsx_le_perm_store_<mode> are set to 12 for modes VSX_D/VSX_W/V8HI/V16QI
> /VSX_LE_128, I guess it is split to two rs6000_emit_le_vsx_permute before
> reload, but 3 rs6000_emit_le_vsx_permute after reload, so the length is
> 12, then it seems also not reasonable to change it from 12 to 8?  And I am
> not sure when the alternative 1 will be chosen?

This is the instruction *length*, not the cost directly.  The length
has to be correct, not lower than it will turn out to be that is, or on
some big testcases you will get branches that cannot reach their target,
and the resulting ICEs.

Alternatives are chosen by register allocation.  Before register
allocation attributes are taken as if alternative 0 is selected (well,
the first enabled alternative is selected, same thing here).

Which alternative is the expected (or wanted) one?  Either put that one
first, or if it is the longer one, give it an explicit cost.

> ;; The post-reload split requires that we re-permute the source
> ;; register in case it is still live.
> (define_split
>   [(set (match_operand:VSX_LE_128 0 "memory_operand")
>         (match_operand:VSX_LE_128 1 "vsx_register_operand"))]
>   "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed && !TARGET_P9_VECTOR
>    && !altivec_indexed_or_indirect_operand (operands[0], <MODE>mode)"
>   [(const_int 0)]
> {
>   rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode);
>   rs6000_emit_le_vsx_permute (operands[0], operands[1], <MODE>mode);
>   rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode);
>   DONE;
> }) 

So it seems like it is only 3 insns in the very unlucky case?  Normally
it will end up as just one simple store?  So you want an explicit cost
here then:

  ;; What is the insn_cost for this insn?  The target hook can still override
  ;; this.  For optimizing for size the "length" attribute is used instead.
  (define_attr "cost" "" (const_int 0))

So you would use something like

     (set_attr "cost" "4,*")

here (if I got that right, please check :-) )

HtH,


Segher


More information about the Gcc-patches mailing list