ACLE intrinsics: BFloat16 store (vst<n>{q}_bf16) intrinsics for AArch32
Delia Burduv
delia.burduv@arm.com
Thu Mar 5 15:51:00 GMT 2020
Hi,
This is the latest version of the patch. I am forcing -mfloat-abi=hard
because the register allocator behaves differently depending on which
float-abi is used.
Thanks,
Delia
On 3/4/20 5:20 PM, Kyrill Tkachov wrote:
> Hi Delia,
>
> On 3/3/20 5:23 PM, Delia Burduv wrote:
>> Hi,
>>
>> I noticed that the patch doesn't apply cleanly. I fixed it and this is
>> the latest version.
>>
>> Thanks,
>> Delia
>>
>> On 3/3/20 4:23 PM, Delia Burduv wrote:
>>> Sorry, I forgot the attachment.
>>>
>>> On 3/3/20 4:20 PM, Delia Burduv wrote:
>>>> Hi,
>>>>
>>>> I made a mistake in the previous patch. This is the latest version.
>>>> Please let me know if it is ok.
>>>>
>>>> Thanks,
>>>> Delia
>>>>
>>>> On 2/21/20 3:18 PM, Delia Burduv wrote:
>>>>> Hi Kyrill,
>>>>>
>>>>> The arm_bf16.h is only used for scalar operations. That is how the
>>>>> aarch64 versions are implemented too.
>>>>>
>>>>> Thanks,
>>>>> Delia
>>>>>
>>>>> On 2/21/20 2:06 PM, Kyrill Tkachov wrote:
>>>>>> Hi Delia,
>>>>>>
>>>>>> On 2/19/20 5:25 PM, Delia Burduv wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Here is the latest version of the patch. It just has some minor
>>>>>>> formatting changes that were brought up by Richard Sandiford in the
>>>>>>> AArch64 patches
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Delia
>>>>>>>
>>>>>>> On 1/22/20 5:29 PM, Delia Burduv wrote:
>>>>>>> > Ping.
>>>>>>> >
>>>>>>> > I will change the tests to use the exact input and output
>>>>>>> registers as
>>>>>>> > Richard Sandiford suggested for the AArch64 patches.
>>>>>>> >
>>>>>>> > On 12/20/19 6:46 PM, Delia Burduv wrote:
>>>>>>> >> This patch adds the ARMv8.6 ACLE BFloat16 store intrinsics
>>>>>>> >> vst<n>{q}_bf16 as part of the BFloat16 extension.
>>>>>>> >>
>>>>>>> (https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics)
>>>>>>>
>>>>>>> >>
>>>>>>> >> The intrinsics are declared in arm_neon.h .
>>>>>>> >> A new test is added to check assembler output.
>>>>>>> >>
>>>>>>> >> This patch depends on the Arm back-end patche.
>>>>>>> >> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)
>>>>>>> >>
>>>>>>> >> Tested for regression on arm-none-eabi and armeb-none-eabi. I
>>>>>>> don't
>>>>>>> >> have commit rights, so if this is ok can someone please commit
>>>>>>> it for me?
>>>>>>> >>
>>>>>>> >> gcc/ChangeLog:
>>>>>>> >>
>>>>>>> >> 2019-11-14 Delia Burduv <delia.burduv@arm.com>
>>>>>>> >>
>>>>>>> >> * config/arm/arm_neon.h (bfloat16_t): New typedef.
>>>>>>> >> (bfloat16x4x2_t): New typedef.
>>>>>>> >> (bfloat16x8x2_t): New typedef.
>>>>>>> >> (bfloat16x4x3_t): New typedef.
>>>>>>> >> (bfloat16x8x3_t): New typedef.
>>>>>>> >> (bfloat16x4x4_t): New typedef.
>>>>>>> >> (bfloat16x8x4_t): New typedef.
>>>>>>> >> (vst2_bf16): New.
>>>>>>> >> (vst2q_bf16): New.
>>>>>>> >> (vst3_bf16): New.
>>>>>>> >> (vst3q_bf16): New.
>>>>>>> >> (vst4_bf16): New.
>>>>>>> >> (vst4q_bf16): New.
>>>>>>> >> * config/arm/arm-builtins.c (E_V2BFmode): New mode.
>>>>>>> >> (VAR13): New.
>>>>>>> >> (arm_simd_types[Bfloat16x2_t]):New type.
>>>>>>> >> * config/arm/arm-modes.def (V2BF): New mode.
>>>>>>> >> * config/arm/arm-simd-builtin-types.def
>>>>>>> >> (Bfloat16x2_t): New entry.
>>>>>>> >> * config/arm/arm_neon_builtins.def
>>>>>>> >> (vst2): Changed to VAR13 and added v4bf, v8bf
>>>>>>> >> (vst3): Changed to VAR13 and added v4bf, v8bf
>>>>>>> >> (vst4): Changed to VAR13 and added v4bf, v8bf
>>>>>>> >> * config/arm/iterators.md (VDXBF): New iterator.
>>>>>>> >> (VQ2BF): New iterator.
>>>>>>> >> (V_elem): Added V4BF, V8BF.
>>>>>>> >> (V_sz_elem): Added V4BF, V8BF.
>>>>>>> >> (V_mode_nunits): Added V4BF, V8BF.
>>>>>>> >> (q): Added V4BF, V8BF.
>>>>>>> >> *config/arm/neon.md (vst2): Used new iterators.
>>>>>>> >> (vst3): Used new iterators.
>>>>>>> >> (vst3qa): Used new iterators.
>>>>>>> >> (vst3qb): Used new iterators.
>>>>>>> >> (vst4): Used new iterators.
>>>>>>> >> (vst4qa): Used new iterators.
>>>>>>> >> (vst4qb): Used new iterators.
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> gcc/testsuite/ChangeLog:
>>>>>>> >>
>>>>>>> >> 2019-11-14 Delia Burduv <delia.burduv@arm.com>
>>>>>>> >>
>>>>>>> >> * gcc.target/arm/simd/bf16_vstn_1.c: New test.
>>>>>>
>>>>>> One thing I just noticed in this and the other arm bfloat16
>>>>>> patches...
>>>>>>
>>>>>> diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
>>>>>> index
>>>>>> 3c78f435009ab027f92693d00ab5b40960d5419d..fd81c18948db3a7f6e8e863d32511f75bf950e6a
>>>>>> 100644
>>>>>> --- a/gcc/config/arm/arm_neon.h
>>>>>> +++ b/gcc/config/arm/arm_neon.h
>>>>>> @@ -18742,6 +18742,89 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r,
>>>>>> float32x4_t __a, float32x4_t __b,
>>>>>> return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b, __index);
>>>>>> }
>>>>>>
>>>>>> +#pragma GCC push_options
>>>>>> +#pragma GCC target ("arch=armv8.2-a+bf16")
>>>>>> +
>>>>>> +typedef struct bfloat16x4x2_t
>>>>>> +{
>>>>>> + bfloat16x4_t val[2];
>>>>>> +} bfloat16x4x2_t;
>>>>>>
>>>>>>
>>>>>> These should be in a new arm_bf16.h file that gets included in the
>>>>>> main arm_neon.h file, right?
>>>>>> I believe the aarch64 versions are implemented that way.
>>>>>>
>>>>>> Otherwise the patch looks good to me.
>>>>>> Thanks!
>>>>>> Kyrill
>>>>>>
>>>>>>
>>>>>> +
>>>>>> +typedef struct bfloat16x8x2_t
>>>>>> +{
>>>>>> + bfloat16x8_t val[2];
>>>>>> +} bfloat16x8x2_t;
>>>>>> +
>>>>>>
>
> diff --git a/gcc/testsuite/gcc.target/arm/simd/bf16_vstn_1.c
> b/gcc/testsuite/gcc.target/arm/simd/bf16_vstn_1.c
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..b52ecfb959776fd04c7c33908cb7f8898ec3fe0b
>
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/simd/bf16_vstn_1.c
> @@ -0,0 +1,84 @@
> +/* { dg-do assemble } */
> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
> +/* { dg-add-options arm_v8_2a_bf16_neon } */
> +/* { dg-additional-options "-save-temps" } */
> +/* { dg-final { check-function-bodies "**" "" {-O[^0]} } } */
> +
>
>
> I don't see the check-function-bodies checks being performed in my
> testing. Changing the directives order to:
> /* { dg-do assemble } */
> /* { dg-options "-save-temps" } */
> /* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
> /* { dg-add-options arm_v8_2a_bf16_neon } */
> /* { dg-final { check-function-bodies "**" "" } } */
>
> makes them run but they fail, I think because this test also needs an
> -O2 option, same as the load intrinsics patch. Can you please adjust the
> order of the dg-* directives in the test and the function body scan
> tests to match the codegen?
> With this, it will be ready to go :)
> Thanks,
> Kyrill
>
>
>
> +#include "arm_neon.h"
> +
> +/*
> +**test_vst2_bf16:
> +** ...
> +** vst2.16 {d16-d17}, \[r0\]
> +** ...
> +*/
> +void
> +test_vst2_bf16 (bfloat16_t *ptr, bfloat16x4x2_t val)
> +{
> + vst2_bf16 (ptr, val);
> +}
> +
>
More information about the Gcc-patches
mailing list