ACLE intrinsics: BFloat16 store (vst<n>{q}_bf16) intrinsics for AArch32

Tue Mar 3 17:24:00 GMT 2020

Hi,

I noticed that the patch doesn't apply cleanly. I fixed it and this is 
the latest version.

Thanks,
Delia

On 3/3/20 4:23 PM, Delia Burduv wrote:
> Sorry, I forgot the attachment.
> 
> On 3/3/20 4:20 PM, Delia Burduv wrote:
>> Hi,
>>
>> I made a mistake in the previous patch. This is the latest version. 
>> Please let me know if it is ok.
>>
>> Thanks,
>> Delia
>>
>> On 2/21/20 3:18 PM, Delia Burduv wrote:
>>> Hi Kyrill,
>>>
>>> The arm_bf16.h is only used for scalar operations. That is how the 
>>> aarch64 versions are implemented too.
>>>
>>> Thanks,
>>> Delia
>>>
>>> On 2/21/20 2:06 PM, Kyrill Tkachov wrote:
>>>> Hi Delia,
>>>>
>>>> On 2/19/20 5:25 PM, Delia Burduv wrote:
>>>>> Hi,
>>>>>
>>>>> Here is the latest version of the patch. It just has some minor
>>>>> formatting changes that were brought up by Richard Sandiford in the
>>>>> AArch64 patches
>>>>>
>>>>> Thanks,
>>>>> Delia
>>>>>
>>>>> On 1/22/20 5:29 PM, Delia Burduv wrote:
>>>>> > Ping.
>>>>> >
>>>>> > I will change the tests to use the exact input and output 
>>>>> registers as
>>>>> > Richard Sandiford suggested for the AArch64 patches.
>>>>> >
>>>>> > On 12/20/19 6:46 PM, Delia Burduv wrote:
>>>>> >> This patch adds the ARMv8.6 ACLE BFloat16 store intrinsics
>>>>> >> vst<n>{q}_bf16 as part of the BFloat16 extension.
>>>>> >> 
>>>>> (https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) 
>>>>>
>>>>> >>
>>>>> >> The intrinsics are declared in arm_neon.h .
>>>>> >> A new test is added to check assembler output.
>>>>> >>
>>>>> >> This patch depends on the Arm back-end patche.
>>>>> >> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)
>>>>> >>
>>>>> >> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't
>>>>> >> have commit rights, so if this is ok can someone please commit 
>>>>> it for me?
>>>>> >>
>>>>> >> gcc/ChangeLog:
>>>>> >>
>>>>> >> 2019-11-14  Delia Burduv <delia.burduv@arm.com>
>>>>> >>
>>>>> >>      * config/arm/arm_neon.h (bfloat16_t): New typedef.
>>>>> >>          (bfloat16x4x2_t): New typedef.
>>>>> >>          (bfloat16x8x2_t): New typedef.
>>>>> >>          (bfloat16x4x3_t): New typedef.
>>>>> >>          (bfloat16x8x3_t): New typedef.
>>>>> >>          (bfloat16x4x4_t): New typedef.
>>>>> >>          (bfloat16x8x4_t): New typedef.
>>>>> >>          (vst2_bf16): New.
>>>>> >>      (vst2q_bf16): New.
>>>>> >>      (vst3_bf16): New.
>>>>> >>      (vst3q_bf16): New.
>>>>> >>      (vst4_bf16): New.
>>>>> >>      (vst4q_bf16): New.
>>>>> >>          * config/arm/arm-builtins.c (E_V2BFmode): New mode.
>>>>> >>          (VAR13): New.
>>>>> >>          (arm_simd_types[Bfloat16x2_t]):New type.
>>>>> >>          * config/arm/arm-modes.def (V2BF): New mode.
>>>>> >>          * config/arm/arm-simd-builtin-types.def
>>>>> >>          (Bfloat16x2_t): New entry.
>>>>> >>          * config/arm/arm_neon_builtins.def
>>>>> >>          (vst2): Changed to VAR13 and added v4bf, v8bf
>>>>> >>          (vst3): Changed to VAR13 and added v4bf, v8bf
>>>>> >>          (vst4): Changed to VAR13 and added v4bf, v8bf
>>>>> >>          * config/arm/iterators.md (VDXBF): New iterator.
>>>>> >>          (VQ2BF): New iterator.
>>>>> >>          (V_elem): Added V4BF, V8BF.
>>>>> >>          (V_sz_elem): Added V4BF, V8BF.
>>>>> >>          (V_mode_nunits): Added V4BF, V8BF.
>>>>> >>          (q): Added V4BF, V8BF.
>>>>> >>          *config/arm/neon.md (vst2): Used new iterators.
>>>>> >>          (vst3): Used new iterators.
>>>>> >>          (vst3qa): Used new iterators.
>>>>> >>          (vst3qb): Used new iterators.
>>>>> >>          (vst4): Used new iterators.
>>>>> >>          (vst4qa): Used new iterators.
>>>>> >>          (vst4qb): Used new iterators.
>>>>> >>
>>>>> >>
>>>>> >> gcc/testsuite/ChangeLog:
>>>>> >>
>>>>> >> 2019-11-14  Delia Burduv <delia.burduv@arm.com>
>>>>> >>
>>>>> >>      * gcc.target/arm/simd/bf16_vstn_1.c: New test.
>>>>
>>>> One thing I just noticed in this and the other arm bfloat16 patches...
>>>>
>>>> diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
>>>> index 
>>>> 3c78f435009ab027f92693d00ab5b40960d5419d..fd81c18948db3a7f6e8e863d32511f75bf950e6a 
>>>> 100644
>>>> --- a/gcc/config/arm/arm_neon.h
>>>> +++ b/gcc/config/arm/arm_neon.h
>>>> @@ -18742,6 +18742,89 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, 
>>>> float32x4_t __a, float32x4_t __b,
>>>>     return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b, __index);
>>>>   }
>>>>
>>>> +#pragma GCC push_options
>>>> +#pragma GCC target ("arch=armv8.2-a+bf16")
>>>> +
>>>> +typedef struct bfloat16x4x2_t
>>>> +{
>>>> +  bfloat16x4_t val[2];
>>>> +} bfloat16x4x2_t;
>>>>
>>>>
>>>> These should be in a new arm_bf16.h file that gets included in the 
>>>> main arm_neon.h file, right?
>>>> I believe the aarch64 versions are implemented that way.
>>>>
>>>> Otherwise the patch looks good to me.
>>>> Thanks!
>>>> Kyrill
>>>>
>>>>
>>>>   +
>>>> +typedef struct bfloat16x8x2_t
>>>> +{
>>>> +  bfloat16x8_t val[2];
>>>> +} bfloat16x8x2_t;
>>>> +
>>>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rb12278.patch
Type: text/x-patch
Size: 13967 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20200303/6774c6d3/attachment.bin>