[PATCH][AArch64] ACLE intrinsics: get low/high half from BFloat16 vector
Richard Sandiford
richard.sandiford@arm.com
Tue Nov 3 14:05:40 GMT 2020
Dennis Zhang <dennis.zhang@arm.com> writes:
> Hi Richard,
>
> On 10/30/20 2:07 PM, Richard Sandiford wrote:
>> Dennis Zhang <Dennis.Zhang@arm.com> writes:
>>> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
>>> index 332a0b6b1ea..39ebb776d1d 100644
>>> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
>>> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
>>> @@ -719,6 +719,9 @@
>>> VAR1 (QUADOP_LANE, bfmlalb_lane_q, 0, ALL, v4sf)
>>> VAR1 (QUADOP_LANE, bfmlalt_lane_q, 0, ALL, v4sf)
>>>
>>> + /* Implemented by aarch64_vget_halfv8bf. */
>>> + VAR1 (GETREG, vget_half, 0, ALL, v8bf)
>>
>> This should be AUTO_FP, since it doesn't have any side-effects.
>> (As before, we should probably rename the flag, but that's separate work.)
>>
>>> +
>>> /* Implemented by aarch64_simd_<sur>mmlav16qi. */
>>> VAR1 (TERNOP, simd_smmla, 0, NONE, v16qi)
>>> VAR1 (TERNOPU, simd_ummla, 0, NONE, v16qi)
>>> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
>>> index 9f0e2bd1e6f..f62c52ca327 100644
>>> --- a/gcc/config/aarch64/aarch64-simd.md
>>> +++ b/gcc/config/aarch64/aarch64-simd.md
>>> @@ -7159,6 +7159,19 @@
>>> [(set_attr "type" "neon_dot<VDQSF:q>")]
>>> )
>>>
>>> +;; vget_low/high_bf16
>>> +(define_expand "aarch64_vget_halfv8bf"
>>> + [(match_operand:V4BF 0 "register_operand")
>>> + (match_operand:V8BF 1 "register_operand")
>>> + (match_operand:SI 2 "aarch64_zero_or_1")]
>>> + "TARGET_BF16_SIMD"
>>> +{
>>> + int hbase = INTVAL (operands[2]);
>>> + rtx sel = aarch64_gen_stepped_int_parallel (4, hbase * 4, 1);
>>
>> I think this needs to be:
>>
>> aarch64_simd_vect_par_cnst_half
>>
>> instead. The issue is that on big-endian targets, GCC assumes vector
>> lane 0 is in the high part of the register, whereas for AArch64 it's
>> always in the low part of the register. So we convert from AArch64
>> numbering to GCC numbering when generating the rtx and then take
>> endianness into account when matching the rtx later.
>>
>> It would be good to have -mbig-endian tests that make sure we generate
>> the right instruction for each function (i.e. we get them the right way
>> round). I guess it would be good to test that for little-endian too.
>>
>
> I've updated the expander using aarch64_simd_vect_par_cnst_half.
> And the expander is divided into two for getting low and high half
> seperately.
> It's tested for aarch64-none-linux-gnu and aarch64_be-none-linux-gnu
> targets with new tests including -mbig-endian option.
>
>>> + emit_insn (gen_aarch64_get_halfv8bf (operands[0], operands[1], sel));
>>> + DONE;
>>> +})
>>> +
>>> ;; bfmmla
>>> (define_insn "aarch64_bfmmlaqv4sf"
>>> [(set (match_operand:V4SF 0 "register_operand" "=w")
>>> diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
>>> index 215fcec5955..0c8bc2b0c73 100644
>>> --- a/gcc/config/aarch64/predicates.md
>>> +++ b/gcc/config/aarch64/predicates.md
>>> @@ -84,6 +84,10 @@
>>> (ior (match_test "op == constm1_rtx")
>>> (match_test "op == const1_rtx"))))))
>>>
>>> +(define_predicate "aarch64_zero_or_1"
>>> + (and (match_code "const_int")
>>> + (match_test "op == const0_rtx || op == const1_rtx")))
>>
>> zero_or_1 looked odd to me, feels like it should be 0_or_1 or zero_or_one.
>> But I see that it's for consistency with aarch64_reg_zero_or_m1_or_1,
>> so let's keep it as-is.
>>
>
> This predicate is removed since there is no need of the imm operand in
> the new expanders.
>
> Thanks for the reviews.
> Is it OK for trunk now?
Looks good. OK for trunk and branches, thanks.
Richard
More information about the Gcc-patches
mailing list