[GCC][PATCH][Aarch64] Add Bfloat16_t scalar type, vector types and machine modes to Aarch64 back-end [1/2]

Thu Jan 9 15:12:00 GMT 2020

On 1/7/20 5:14 PM, Richard Sandiford wrote:
> Thanks for the update.  The new patch looks really good, just some
> minor comments.
> 
> Stam Markianos-Wright <Stam.Markianos-Wright@arm.com> writes:
>> [...]
>> Also I've update the filenames of all our tests to make them a bit clearer:
>>
>> C tests:
>>
>> __ bfloat16_scalar_compile_1.c to bfloat16_scalar_compile_3.c: Compilation of
>> scalar moves/loads/stores with "-march8.2-a+bf16", "-march8.2-a and +bf16 target
>> pragma", "-march8.2-a" (now does not error out at all). There now include
>> register asms to check more MOV alternatives.
>>
>> __ bfloat16_scalar_compile_4.c: The _Complex error test.
>>
>> __ bfloat16_simd_compile_1.c to bfloat16_simd_compile_3.c: Likewise to
>> x_scalar_x, but also include (vector) 0x1234.. compilation (no assembler scan).
> 
> Sounds good to me, although TBH the "_compile" feels a bit redundant.

Yes, true that! Removed it.

> 
>> I had also done a small c++ test, but have chosen to shift that to the [2/2]
>> patch because it is currently being blocked by target_invalid_conversion.
> 
> OK.  Does that include the mangling test?

Aaah no, this is the test checking for bfloat16_t(), bfloat16_t (0x1234), 
bfloat16_t(0.25), etc. (which are more of language-level checks)

Oh! I had forgotten about the mangling, so I've added it in this revision.

> 
>> [...]
>>>>> - a test that involves moving constants, for both scalars and vectors.
>>>>>      You can create zero scalar constants in C++ using bfloat16_t() etc.
>>>>>      For vectors it's possible to do things like:
>>>>>
>>>>>        typedef short v2bf __attribute__((vector_size(4)));
>>>>>        v2hi foo (void) { return (v2hi) 0x12345678; }
>>>>>
>>>>>      The same sort of things should work for bfloat16x4_t and bfloat16x8_t.
>>>>
>>>> Leaving this as an open issue for now because I'm not 100% sure what we
>>>> should/shouldn't be allowing past the tree-level target hooks.
>>>>
>>>> If we do want to block this we would do this in the [2/2] patch.
>>>> I will come back to it and create a scan-assembler test when I'm more clear on
>>>> what we should and shouldn't allow at the higher level :)
>>>
>>> FWIW, I'm not sure we should go out of our way to disallow this.
>>> Preventing bfloat16_t() in C++ would IMO be unnatural.  And the
>>> "(vector) vector-sized-integer" syntax specifically treats the vector
>>> as a bundle of bits without really caring what the element type is.
>>> Even if we did manage to forbid the conversion in that context,
>>> it would still be possible to achieve the same thing using:
>>>
>>>      v2hi
>>>      foo (void)
>>>      {
>>>        union { v2hi v; unsigned int i; } u;
>>>        u.i = 0x12345678;
>>>        return u.v;
>>>      }
>>>
>> Added the compilation of "(vector) vector-sized-integer" in the vector tests.
>>
>> But target_invalid_conversion in the [2/2] patch is a complication to this (as
>> with bfloat_16t() in c++.
>>
>> I was under the impression that the original intent of bfloat was for it to be
>> storage only, with any initialisation happening through the float32 convert
>> intrinsic.
>>
>> Either I'd be happy to allow it, but it does feel like we'd slightly be going
>> against what's the ACLE currently.
>> However, looking back at it now, it only mentions using ACLE intrinsics over C
>> operators, so I'd be happy to allow this for vectors.
>>
>> For scalars though, if we e.g. were to allow:
>>
>> bfloat16_t (0x1234);
>>
>> on a single bfloat, I don't see how we could still block conversions like:
>>
>> bfloat16_t scalar1 = 0.1;
>> bfloat16_t scalar2 = 0;
>> bfloat16_t scalar3 = is_a_float;
>>
>> Agreed that the union {} would still always slip through, though.
> 
> It wasn't clear sorry, but I meant literally "bfloat16_t()", i.e.
> construction with zero initialisation.  I agree we don't want to
> support "bfloat16_t(0.25)" etc.

Added to [2/2] as mentioned above.

> 
>> [...]
>>>> diff --git a/gcc/testsuite/gcc.target/aarch64/bfloat16_compile_1.c b/gcc/testsuite/gcc.target/aarch64/bfloat16_compile_1.c
>>>> new file mode 100644
>>>> index 00000000000..f2bef671deb
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/aarch64/bfloat16_compile_1.c
>>>> @@ -0,0 +1,51 @@
>>>> +/* { dg-do assemble { target { aarch64*-*-* } } } */
>>>> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
>>>> +/* { dg-add-options arm_v8_2a_bf16_neon }  */
>>>> +/* { dg-additional-options "-O3 --save-temps" } */
>>>> +/* { dg-final { check-function-bodies "**" "" } } */
>>>> +
>>>> +#include <arm_neon.h>
>>>> +
>>>> +/*
>>>> +**stacktest1:
>>>> +**	...
>>>> +**	str	h0, \[sp, [0-9]+\]
>>>> +**	ldr	h0, \[sp, [0-9]+\]
>>>> +**	...
>>>> +**	ret
>>>> +*/
>>>> +bfloat16_t stacktest1 (bfloat16_t __a)
>>>> +{
>>>> +  volatile bfloat16_t b = __a;
>>>> +  return b;
>>>> +}
>>>> +
>>>> +/*
>>>> +**stacktest2:
>>>> +**	...
>>>> +**	str	d0, \[sp, [0-9]+\]
>>>> +**	ldr	d0, \[sp, [0-9]+\]
>>>> +**	...
>>>> +**	ret
>>>> +*/
>>>> +bfloat16x4_t stacktest2 (bfloat16x4_t __a)
>>>> +{
>>>> +  volatile bfloat16x4_t b = __a;
>>>> +  return b;
>>>> +}
>>>> +
>>>> +/*
>>>> +**stacktest3:
>>>> +**	...
>>>> +**	str	q0, \[sp\]
>>>> +**	ldr	q0, \[sp\]
>>>> +**	...
>>>> +**	ret
>>>> +*/
>>>> +bfloat16x8_t stacktest3 (bfloat16x8_t __a)
>>>> +{
>>>> +  volatile bfloat16x8_t b = __a;
>>>> +  return b;
>>>> +}
>>>
>>> Might be a daft question, but why do we have an offset for the first
>>> two and not for the last one?  Might be worth hard-coding whatever
>>> offset we use.
> 
> I should have realised first time, but it's because we allocate the
> local variable area downwards from the soft frame pointer.  So the
> area gets padded downwards rather than upwards.

Ahh ok thank you!

Also in terms of these I removed the #foo markers because they were tripping up 
check-function-bodies (I realised that they weren't being ignored like other 
comments, so after removing them I no longer need to have the "+**	..." 
before/after the MOVs).

> 
>> [...]
>> @@ -97,6 +107,12 @@
>>   ;; Copy of the above.
>>   (define_mode_iterator VQ2 [V16QI V8HI V4SI V2DI V8HF V4SF V2DF])
>>   
>> +;; Quad vector modes suitable for moving.  Includes BFmode.
>> +(define_mode_iterator VQMOV [V16QI V8HI V4SI V2DI V8HF V8BF V4SF V2DF])
>> +
>> +;; Quad vector modes suitable for moving.  Includes BFmode.
>> +(define_mode_iterator VQMOV_NO2E [V16QI V8HI V4SI V8HF V8BF V4SF])
> 
> Comment pasto for VQMOV_NO2E.  Think it should be:
> 
> ;; VQMOV without 2-element modes.

Yes, correct!

> 
>>   ;; Quad integer vector modes.
>>   (define_mode_iterator VQ_I [V16QI V8HI V4SI V2DI])
>>   
>> @@ -160,6 +176,11 @@
>>   (define_mode_iterator VALL_F16 [V8QI V16QI V4HI V8HI V2SI V4SI V2DI
>>   				V4HF V8HF V2SF V4SF V2DF])
>>   
>> +;; All Advanced SIMD modes suitable for moving, loading, and storing,
>> +;; including special Bfloat vector types.
>> +(define_mode_iterator VALL_F16MOV [V8QI V16QI V4HI V8HI V2SI V4SI V2DI
>> +				V4HF V8HF V4BF V8BF V2SF V4SF V2DF])
> 
> Nit: line should be indented below "V8QI".

Done!

> 
>> @@ -226,6 +247,9 @@
>>   ;; Advanced SIMD modes for Q and H types.
>>   (define_mode_iterator VDQQH [V8QI V16QI V4HI V8HI])
>>   
>> +;; Advanced SIMD modes for BF vector types.
>> +(define_mode_iterator VBF [V4BF V8BF])
> 
> Nothing in this patch uses VBF, so probably best to leave it until later.

Yep, removed it.

> 
>> diff --git a/gcc/testsuite/gcc.target/aarch64/bfloat16_scalar_compile_1.c b/gcc/testsuite/gcc.target/aarch64/bfloat16_scalar_compile_1.c
>> new file mode 100644
>> index 00000000000..5186d0e3d24
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/bfloat16_scalar_compile_1.c
>> @@ -0,0 +1,118 @@
>> [...]
>> +/*
>> +**bfloat_mov_rm:
>> +**	...
>> +**	strh	w2, \[sp, 14\]
>> +**	...
>> +**	ret
>> +*/
>> +void bfloat_mov_rm (void)
>> +{
>> +  register bfloat16_t x asm ("w2");
>> +  volatile bfloat16_t y;
>> +  asm volatile ("#foo" : "=r" (x));
>> +  y = x;
>> +  asm volatile ("#foo" : : : "memory");
>> +}
> 
> Probably simpler as:
> 
> /*
> **bfloat_mov_rm:
> **	strh	w2, \[x0\]
> **	ret
> */
> void bfloat_mov_rm (bfloat16_t *ptr)
> {
>    register bfloat16_t x asm ("w2");
>    asm volatile ("#foo" : "=r" (x));
>    *ptr = x;
> }

Done

> 
>> +/*
>> +**bfloat_mov_mr:
>> +**	...
>> +**	ldrh	w2, \[sp, 14\]
>> +**	...
>> +**	ret
>> +*/
>> +void bfloat_mov_mr (void)
>> +{
>> +  volatile bfloat16_t x;
>> +  register bfloat16_t y asm ("w2");
>> +  asm volatile ("#foo" : : : "memory");
>> +  y = x;
>> +  asm volatile ("#foo" :: "r" (y));
>> +}
> 
> Similarly here:
> 
> /*
> **bfloat_mov_mr:
> **	ldrh	w2, \[x0\]
> **	ret
> */
> void bfloat_mov_mr (bfloat16_t *ptr)
> {
>    register bfloat16_t y asm ("w2");
>    y = *ptr;
>    asm volatile ("#foo" :: "r" (y));
> }
> 
> Same for _2.d and _3.c

Done

> 
>> diff --git a/gcc/testsuite/gcc.target/aarch64/bfloat16_scalar_compile_2.c b/gcc/testsuite/gcc.target/aarch64/bfloat16_scalar_compile_2.c
>> new file mode 100644
>> index 00000000000..02656d32f14
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/bfloat16_scalar_compile_2.c
>> @@ -0,0 +1,122 @@
>> +/* { dg-do assemble { target { aarch64*-*-* } } } */
>> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
>> +/* { dg-additional-options "-march=armv8.2-a -O3 --save-temps -std=gnu90" } */
>> +/* { dg-final { check-function-bodies "**" "" } } */
>> +
>> +#pragma GCC push_options
>> +#pragma GCC target ("+bf16")
>> +
>> +#include <arm_bf16.h>
> 
> This effectively tests the same thing as bfloat16_scalar_compile_1.c.
> IMO the more interesting way round is:
> 
> #include <arm_bf16.h>

Yes, I changed it in the simd test but not here. Good catch!

> 
> #pragma GCC push_options
> #pragma GCC target ("+bf16")
> 
> like for the simd tests.  So _1.c is the normal "enable before include"
> case, _2.c is "enable after include" and _3.c is "don't enable at all".
> 
> Thanks,
> Richard
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BFmode1of2-rev4.patch
Type: text/x-patch
Size: 39277 bytes
Desc: BFmode1of2-rev4.patch
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20200109/85bdfbb0/attachment.bin>