ACLE intrinsics: BFloat16 store (vst<n>{q}_bf16) intrinsics for AArch32

Tue Mar 3 16:23:00 GMT 2020

Sorry, I forgot the attachment.

On 3/3/20 4:20 PM, Delia Burduv wrote:
> Hi,
> 
> I made a mistake in the previous patch. This is the latest version. 
> Please let me know if it is ok.
> 
> Thanks,
> Delia
> 
> On 2/21/20 3:18 PM, Delia Burduv wrote:
>> Hi Kyrill,
>>
>> The arm_bf16.h is only used for scalar operations. That is how the 
>> aarch64 versions are implemented too.
>>
>> Thanks,
>> Delia
>>
>> On 2/21/20 2:06 PM, Kyrill Tkachov wrote:
>>> Hi Delia,
>>>
>>> On 2/19/20 5:25 PM, Delia Burduv wrote:
>>>> Hi,
>>>>
>>>> Here is the latest version of the patch. It just has some minor
>>>> formatting changes that were brought up by Richard Sandiford in the
>>>> AArch64 patches
>>>>
>>>> Thanks,
>>>> Delia
>>>>
>>>> On 1/22/20 5:29 PM, Delia Burduv wrote:
>>>> > Ping.
>>>> >
>>>> > I will change the tests to use the exact input and output 
>>>> registers as
>>>> > Richard Sandiford suggested for the AArch64 patches.
>>>> >
>>>> > On 12/20/19 6:46 PM, Delia Burduv wrote:
>>>> >> This patch adds the ARMv8.6 ACLE BFloat16 store intrinsics
>>>> >> vst<n>{q}_bf16 as part of the BFloat16 extension.
>>>> >> 
>>>> (https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) 
>>>>
>>>> >>
>>>> >> The intrinsics are declared in arm_neon.h .
>>>> >> A new test is added to check assembler output.
>>>> >>
>>>> >> This patch depends on the Arm back-end patche.
>>>> >> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)
>>>> >>
>>>> >> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't
>>>> >> have commit rights, so if this is ok can someone please commit it 
>>>> for me?
>>>> >>
>>>> >> gcc/ChangeLog:
>>>> >>
>>>> >> 2019-11-14  Delia Burduv <delia.burduv@arm.com>
>>>> >>
>>>> >>      * config/arm/arm_neon.h (bfloat16_t): New typedef.
>>>> >>          (bfloat16x4x2_t): New typedef.
>>>> >>          (bfloat16x8x2_t): New typedef.
>>>> >>          (bfloat16x4x3_t): New typedef.
>>>> >>          (bfloat16x8x3_t): New typedef.
>>>> >>          (bfloat16x4x4_t): New typedef.
>>>> >>          (bfloat16x8x4_t): New typedef.
>>>> >>          (vst2_bf16): New.
>>>> >>      (vst2q_bf16): New.
>>>> >>      (vst3_bf16): New.
>>>> >>      (vst3q_bf16): New.
>>>> >>      (vst4_bf16): New.
>>>> >>      (vst4q_bf16): New.
>>>> >>          * config/arm/arm-builtins.c (E_V2BFmode): New mode.
>>>> >>          (VAR13): New.
>>>> >>          (arm_simd_types[Bfloat16x2_t]):New type.
>>>> >>          * config/arm/arm-modes.def (V2BF): New mode.
>>>> >>          * config/arm/arm-simd-builtin-types.def
>>>> >>          (Bfloat16x2_t): New entry.
>>>> >>          * config/arm/arm_neon_builtins.def
>>>> >>          (vst2): Changed to VAR13 and added v4bf, v8bf
>>>> >>          (vst3): Changed to VAR13 and added v4bf, v8bf
>>>> >>          (vst4): Changed to VAR13 and added v4bf, v8bf
>>>> >>          * config/arm/iterators.md (VDXBF): New iterator.
>>>> >>          (VQ2BF): New iterator.
>>>> >>          (V_elem): Added V4BF, V8BF.
>>>> >>          (V_sz_elem): Added V4BF, V8BF.
>>>> >>          (V_mode_nunits): Added V4BF, V8BF.
>>>> >>          (q): Added V4BF, V8BF.
>>>> >>          *config/arm/neon.md (vst2): Used new iterators.
>>>> >>          (vst3): Used new iterators.
>>>> >>          (vst3qa): Used new iterators.
>>>> >>          (vst3qb): Used new iterators.
>>>> >>          (vst4): Used new iterators.
>>>> >>          (vst4qa): Used new iterators.
>>>> >>          (vst4qb): Used new iterators.
>>>> >>
>>>> >>
>>>> >> gcc/testsuite/ChangeLog:
>>>> >>
>>>> >> 2019-11-14  Delia Burduv <delia.burduv@arm.com>
>>>> >>
>>>> >>      * gcc.target/arm/simd/bf16_vstn_1.c: New test.
>>>
>>> One thing I just noticed in this and the other arm bfloat16 patches...
>>>
>>> diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
>>> index 
>>> 3c78f435009ab027f92693d00ab5b40960d5419d..fd81c18948db3a7f6e8e863d32511f75bf950e6a 
>>> 100644
>>> --- a/gcc/config/arm/arm_neon.h
>>> +++ b/gcc/config/arm/arm_neon.h
>>> @@ -18742,6 +18742,89 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, 
>>> float32x4_t __a, float32x4_t __b,
>>>     return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b, __index);
>>>   }
>>>
>>> +#pragma GCC push_options
>>> +#pragma GCC target ("arch=armv8.2-a+bf16")
>>> +
>>> +typedef struct bfloat16x4x2_t
>>> +{
>>> +  bfloat16x4_t val[2];
>>> +} bfloat16x4x2_t;
>>>
>>>
>>> These should be in a new arm_bf16.h file that gets included in the 
>>> main arm_neon.h file, right?
>>> I believe the aarch64 versions are implemented that way.
>>>
>>> Otherwise the patch looks good to me.
>>> Thanks!
>>> Kyrill
>>>
>>>
>>>   +
>>> +typedef struct bfloat16x8x2_t
>>> +{
>>> +  bfloat16x8_t val[2];
>>> +} bfloat16x8x2_t;
>>> +
>>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rb12278.patch
Type: text/x-patch
Size: 15003 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20200303/62d00afb/attachment.bin>