Ping: [PATCH] rs6000: Fix wrong code generation for vec_sel [PR94613]

Xionghu Luo luoxhu@linux.ibm.com
Wed Jun 30 01:42:49 GMT 2021


Gentle ping, thanks.

https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html


On 2021/5/14 14:57, Xionghu Luo via Gcc-patches wrote:
> Hi,
> 
> On 2021/5/13 18:49, Segher Boessenkool wrote:
>> Hi!
>>
>> On Fri, Apr 30, 2021 at 01:32:58AM -0500, Xionghu Luo wrote:
>>> The vsel instruction is a bit-wise select instruction.  Using an
>>> IF_THEN_ELSE to express it in RTL is wrong and leads to wrong code
>>> being generated in the combine pass.  Per element selection is a
>>> subset of per bit-wise selection,with the patch the pattern is
>>> written using bit operations.  But there are 8 different patterns
>>> to define "op0 := (op1 & ~op3) | (op2 & op3)":
>>>
>>> (~op3&op1) | (op3&op2),
>>> (~op3&op1) | (op2&op3),
>>> (op3&op2) | (~op3&op1),
>>> (op2&op3) | (~op3&op1),
>>> (op1&~op3) | (op3&op2),
>>> (op1&~op3) | (op2&op3),
>>> (op3&op2) | (op1&~op3),
>>> (op2&op3) | (op1&~op3),
>>>
>>> Combine pass will swap (op1&~op3) to (~op3&op1) due to commutative
>>> canonical, which could reduce it to the FIRST 4 patterns, but it won't
>>> swap (op2&op3) | (~op3&op1) to (~op3&op1) | (op2&op3), so this patch
>>> handles it with two patterns with different NOT op3 position and check
>>> equality inside it.
>>
>> Yup, that latter case does not have canonicalisation rules.  Btw, not
>> only combine does this canonicalisation: everything should,
>> non-canonical RTL is invalid RTL (in the instruction stream, you can do
>> everything in temporary code of course, as long as the RTL isn't
>> malformed).
>>
>>> -(define_insn "*altivec_vsel<mode>"
>>> +(define_insn "altivec_vsel<mode>"
>>>     [(set (match_operand:VM 0 "altivec_register_operand" "=v")
>>> -    (if_then_else:VM
>>> -     (ne:CC (match_operand:VM 1 "altivec_register_operand" "v")
>>> -        (match_operand:VM 4 "zero_constant" ""))
>>> -     (match_operand:VM 2 "altivec_register_operand" "v")
>>> -     (match_operand:VM 3 "altivec_register_operand" "v")))]
>>> -  "VECTOR_MEM_ALTIVEC_P (<MODE>mode)"
>>> -  "vsel %0,%3,%2,%1"
>>> +    (ior:VM
>>> +     (and:VM
>>> +      (not:VM (match_operand:VM 3 "altivec_register_operand" "v"))
>>> +      (match_operand:VM 1 "altivec_register_operand" "v"))
>>> +     (and:VM
>>> +      (match_operand:VM 2 "altivec_register_operand" "v")
>>> +      (match_operand:VM 4 "altivec_register_operand" "v"))))]
>>> +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)
>>> +  && (rtx_equal_p (operands[2], operands[3])
>>> +  || rtx_equal_p (operands[4], operands[3]))"
>>> +  {
>>> +    if (rtx_equal_p (operands[2], operands[3]))
>>> +      return "vsel %0,%1,%4,%3";
>>> +    else
>>> +      return "vsel %0,%1,%2,%3";
>>> +  }
>>>     [(set_attr "type" "vecmove")])
>>
>> That rtx_equal_p stuff is nice and tricky, but it is a bit too tricky I
>> think.  So please write this as two patterns (and keep the expand if
>> that helps).
> 
> I was a bit concerned that there would be a lot of duplicate code if we
> write two patterns for each vsel, totally 4 similar patterns in
> altivec.md and another 4 in vsx.md make it difficult to maintain, however
> I updated it since you prefer this way, as you pointed out the xxsel in
> vsx.md could be folded by later patch.
> 
>>
>>> +(define_insn "altivec_vsel<mode>2"
>>
>> (same here of course).
>>
>>>   ;; Fused multiply add.
>>> diff --git a/gcc/config/rs6000/rs6000-call.c 
>>> b/gcc/config/rs6000/rs6000-call.c
>>> index f5676255387..d65bdc01055 100644
>>> --- a/gcc/config/rs6000/rs6000-call.c
>>> +++ b/gcc/config/rs6000/rs6000-call.c
>>> @@ -3362,11 +3362,11 @@ const struct altivec_builtin_types 
>>> altivec_overloaded_builtins[] = {
>>>       RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 
>>> RS6000_BTI_unsigned_V2DI },
>>>     { ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_2DI,
>>>       RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 
>>> RS6000_BTI_V2DI },
>>> -  { ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_2DI,
>>> +  { ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_2DI_UNS,
>>
>> Are the _uns things still used for anything?  But, let's not change
>> this until Bill's stuff is in :-)
>>
>> Why do you want to change this here, btw?  I don't understand.
> 
> OK, they are actually "unsigned type" overload builtin functions, change
> it or not so far won't cause functionality issue, I will revert this change
> in the updated patch.
> 
>>
>>> +  if (target == 0
>>> +      || GET_MODE (target) != tmode
>>> +      || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
>>
>> No space after ! and other unary operators (except for casts and other
>> operators you write with alphanumerics, like "sizeof").  I know you
>> copied this code, but :-)
> 
> OK, thanks.
> 
>>
>>> @@ -15608,8 +15606,6 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx 
>>> op_true, rtx op_false,
>>>       case GEU:
>>>       case LTU:
>>>       case LEU:
>>> -      /* Mark unsigned tests with CCUNSmode.  */
>>> -      cc_mode = CCUNSmode;
>>>         /* Invert condition to avoid compound test if necessary.  */
>>>         if (rcode == GEU || rcode == LEU)
>>
>> So this is related to the _uns thing.  Could you split off that change?
>> Probably as an earlier patch (but either works for me).
> 
> Not related to the ALTIVEC_BUILTIN_VSEL_2DI_UNS things, previously cc_mode
> is a parameter to generate the condition for IF_THEN_ELSE instruction, now
> we don't need it again as we use IOR (AND... AND...) style, remove it to 
> avoid
> build error.
> 
> 
> -  cond2 = gen_rtx_fmt_ee (NE, cc_mode, gen_lowpart (dest_mode, mask),
> -                         CONST0_RTX (dest_mode));
> -  emit_insn (gen_rtx_SET (dest,
> -                         gen_rtx_IF_THEN_ELSE (dest_mode,
> -                                               cond2,
> -                                               op_true,
> -                                               op_false)));
> +  rtx tmp = gen_rtx_IOR (dest_mode,
> +                        gen_rtx_AND (dest_mode, gen_rtx_NOT (dest_mode, 
> mask),
> +                                     op_false),
> +                        gen_rtx_AND (dest_mode, mask, op_true));
> +  emit_insn (gen_rtx_SET (dest, tmp));
> 
> 
>>
>>> @@ -15629,6 +15625,9 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx 
>>> op_true, rtx op_false,
>>>     if (!mask)
>>>       return 0;
>>> +  if (mask_mode != dest_mode)
>>> +      mask = simplify_gen_subreg (dest_mode, mask, mask_mode, 0);
>>
>> Indent just two characters please: line continuations (usually) align,
>> but indents do not.>
>> Can you fold vsel and xxsel together completely?  They have exactly the
>> same semantics!  This does not have to be in this patch of course.
> 
> I noticed that vperm/xxperm are folded together, do you mean fold 
> vsel/xxsel
> like them?  It's attached as:
> 0002-rs6000-Fold-xxsel-to-vsel-since-they-have-same-seman.patch
> 
> 
> Thanks,
> Xionghu

-- 
Thanks,
Xionghu


More information about the Gcc-patches mailing list