[GCC][PATCH][Aarch64] Add Bfloat16_t scalar type, vector types and machine modes to Aarch64 back-end [1/2]
Stam Markianos-Wright
Stam.Markianos-Wright@arm.com
Thu Jan 9 15:12:00 GMT 2020
On 1/7/20 5:14 PM, Richard Sandiford wrote:
> Thanks for the update. The new patch looks really good, just some
> minor comments.
>
> Stam Markianos-Wright <Stam.Markianos-Wright@arm.com> writes:
>> [...]
>> Also I've update the filenames of all our tests to make them a bit clearer:
>>
>> C tests:
>>
>> __ bfloat16_scalar_compile_1.c to bfloat16_scalar_compile_3.c: Compilation of
>> scalar moves/loads/stores with "-march8.2-a+bf16", "-march8.2-a and +bf16 target
>> pragma", "-march8.2-a" (now does not error out at all). There now include
>> register asms to check more MOV alternatives.
>>
>> __ bfloat16_scalar_compile_4.c: The _Complex error test.
>>
>> __ bfloat16_simd_compile_1.c to bfloat16_simd_compile_3.c: Likewise to
>> x_scalar_x, but also include (vector) 0x1234.. compilation (no assembler scan).
>
> Sounds good to me, although TBH the "_compile" feels a bit redundant.
Yes, true that! Removed it.
>
>> I had also done a small c++ test, but have chosen to shift that to the [2/2]
>> patch because it is currently being blocked by target_invalid_conversion.
>
> OK. Does that include the mangling test?
Aaah no, this is the test checking for bfloat16_t(), bfloat16_t (0x1234),
bfloat16_t(0.25), etc. (which are more of language-level checks)
Oh! I had forgotten about the mangling, so I've added it in this revision.
>
>> [...]
>>>>> - a test that involves moving constants, for both scalars and vectors.
>>>>> You can create zero scalar constants in C++ using bfloat16_t() etc.
>>>>> For vectors it's possible to do things like:
>>>>>
>>>>> typedef short v2bf __attribute__((vector_size(4)));
>>>>> v2hi foo (void) { return (v2hi) 0x12345678; }
>>>>>
>>>>> The same sort of things should work for bfloat16x4_t and bfloat16x8_t.
>>>>
>>>> Leaving this as an open issue for now because I'm not 100% sure what we
>>>> should/shouldn't be allowing past the tree-level target hooks.
>>>>
>>>> If we do want to block this we would do this in the [2/2] patch.
>>>> I will come back to it and create a scan-assembler test when I'm more clear on
>>>> what we should and shouldn't allow at the higher level :)
>>>
>>> FWIW, I'm not sure we should go out of our way to disallow this.
>>> Preventing bfloat16_t() in C++ would IMO be unnatural. And the
>>> "(vector) vector-sized-integer" syntax specifically treats the vector
>>> as a bundle of bits without really caring what the element type is.
>>> Even if we did manage to forbid the conversion in that context,
>>> it would still be possible to achieve the same thing using:
>>>
>>> v2hi
>>> foo (void)
>>> {
>>> union { v2hi v; unsigned int i; } u;
>>> u.i = 0x12345678;
>>> return u.v;
>>> }
>>>
>> Added the compilation of "(vector) vector-sized-integer" in the vector tests.
>>
>> But target_invalid_conversion in the [2/2] patch is a complication to this (as
>> with bfloat_16t() in c++.
>>
>> I was under the impression that the original intent of bfloat was for it to be
>> storage only, with any initialisation happening through the float32 convert
>> intrinsic.
>>
>> Either I'd be happy to allow it, but it does feel like we'd slightly be going
>> against what's the ACLE currently.
>> However, looking back at it now, it only mentions using ACLE intrinsics over C
>> operators, so I'd be happy to allow this for vectors.
>>
>> For scalars though, if we e.g. were to allow:
>>
>> bfloat16_t (0x1234);
>>
>> on a single bfloat, I don't see how we could still block conversions like:
>>
>> bfloat16_t scalar1 = 0.1;
>> bfloat16_t scalar2 = 0;
>> bfloat16_t scalar3 = is_a_float;
>>
>> Agreed that the union {} would still always slip through, though.
>
> It wasn't clear sorry, but I meant literally "bfloat16_t()", i.e.
> construction with zero initialisation. I agree we don't want to
> support "bfloat16_t(0.25)" etc.
Added to [2/2] as mentioned above.
>
>> [...]
>>>> diff --git a/gcc/testsuite/gcc.target/aarch64/bfloat16_compile_1.c b/gcc/testsuite/gcc.target/aarch64/bfloat16_compile_1.c
>>>> new file mode 100644
>>>> index 00000000000..f2bef671deb
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/aarch64/bfloat16_compile_1.c
>>>> @@ -0,0 +1,51 @@
>>>> +/* { dg-do assemble { target { aarch64*-*-* } } } */
>>>> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
>>>> +/* { dg-add-options arm_v8_2a_bf16_neon } */
>>>> +/* { dg-additional-options "-O3 --save-temps" } */
>>>> +/* { dg-final { check-function-bodies "**" "" } } */
>>>> +
>>>> +#include <arm_neon.h>
>>>> +
>>>> +/*
>>>> +**stacktest1:
>>>> +** ...
>>>> +** str h0, \[sp, [0-9]+\]
>>>> +** ldr h0, \[sp, [0-9]+\]
>>>> +** ...
>>>> +** ret
>>>> +*/
>>>> +bfloat16_t stacktest1 (bfloat16_t __a)
>>>> +{
>>>> + volatile bfloat16_t b = __a;
>>>> + return b;
>>>> +}
>>>> +
>>>> +/*
>>>> +**stacktest2:
>>>> +** ...
>>>> +** str d0, \[sp, [0-9]+\]
>>>> +** ldr d0, \[sp, [0-9]+\]
>>>> +** ...
>>>> +** ret
>>>> +*/
>>>> +bfloat16x4_t stacktest2 (bfloat16x4_t __a)
>>>> +{
>>>> + volatile bfloat16x4_t b = __a;
>>>> + return b;
>>>> +}
>>>> +
>>>> +/*
>>>> +**stacktest3:
>>>> +** ...
>>>> +** str q0, \[sp\]
>>>> +** ldr q0, \[sp\]
>>>> +** ...
>>>> +** ret
>>>> +*/
>>>> +bfloat16x8_t stacktest3 (bfloat16x8_t __a)
>>>> +{
>>>> + volatile bfloat16x8_t b = __a;
>>>> + return b;
>>>> +}
>>>
>>> Might be a daft question, but why do we have an offset for the first
>>> two and not for the last one? Might be worth hard-coding whatever
>>> offset we use.
>
> I should have realised first time, but it's because we allocate the
> local variable area downwards from the soft frame pointer. So the
> area gets padded downwards rather than upwards.
Ahh ok thank you!
Also in terms of these I removed the #foo markers because they were tripping up
check-function-bodies (I realised that they weren't being ignored like other
comments, so after removing them I no longer need to have the "+** ..."
before/after the MOVs).
>
>> [...]
>> @@ -97,6 +107,12 @@
>> ;; Copy of the above.
>> (define_mode_iterator VQ2 [V16QI V8HI V4SI V2DI V8HF V4SF V2DF])
>>
>> +;; Quad vector modes suitable for moving. Includes BFmode.
>> +(define_mode_iterator VQMOV [V16QI V8HI V4SI V2DI V8HF V8BF V4SF V2DF])
>> +
>> +;; Quad vector modes suitable for moving. Includes BFmode.
>> +(define_mode_iterator VQMOV_NO2E [V16QI V8HI V4SI V8HF V8BF V4SF])
>
> Comment pasto for VQMOV_NO2E. Think it should be:
>
> ;; VQMOV without 2-element modes.
Yes, correct!
>
>> ;; Quad integer vector modes.
>> (define_mode_iterator VQ_I [V16QI V8HI V4SI V2DI])
>>
>> @@ -160,6 +176,11 @@
>> (define_mode_iterator VALL_F16 [V8QI V16QI V4HI V8HI V2SI V4SI V2DI
>> V4HF V8HF V2SF V4SF V2DF])
>>
>> +;; All Advanced SIMD modes suitable for moving, loading, and storing,
>> +;; including special Bfloat vector types.
>> +(define_mode_iterator VALL_F16MOV [V8QI V16QI V4HI V8HI V2SI V4SI V2DI
>> + V4HF V8HF V4BF V8BF V2SF V4SF V2DF])
>
> Nit: line should be indented below "V8QI".
Done!
>
>> @@ -226,6 +247,9 @@
>> ;; Advanced SIMD modes for Q and H types.
>> (define_mode_iterator VDQQH [V8QI V16QI V4HI V8HI])
>>
>> +;; Advanced SIMD modes for BF vector types.
>> +(define_mode_iterator VBF [V4BF V8BF])
>
> Nothing in this patch uses VBF, so probably best to leave it until later.
Yep, removed it.
>
>> diff --git a/gcc/testsuite/gcc.target/aarch64/bfloat16_scalar_compile_1.c b/gcc/testsuite/gcc.target/aarch64/bfloat16_scalar_compile_1.c
>> new file mode 100644
>> index 00000000000..5186d0e3d24
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/bfloat16_scalar_compile_1.c
>> @@ -0,0 +1,118 @@
>> [...]
>> +/*
>> +**bfloat_mov_rm:
>> +** ...
>> +** strh w2, \[sp, 14\]
>> +** ...
>> +** ret
>> +*/
>> +void bfloat_mov_rm (void)
>> +{
>> + register bfloat16_t x asm ("w2");
>> + volatile bfloat16_t y;
>> + asm volatile ("#foo" : "=r" (x));
>> + y = x;
>> + asm volatile ("#foo" : : : "memory");
>> +}
>
> Probably simpler as:
>
> /*
> **bfloat_mov_rm:
> ** strh w2, \[x0\]
> ** ret
> */
> void bfloat_mov_rm (bfloat16_t *ptr)
> {
> register bfloat16_t x asm ("w2");
> asm volatile ("#foo" : "=r" (x));
> *ptr = x;
> }
Done
>
>> +/*
>> +**bfloat_mov_mr:
>> +** ...
>> +** ldrh w2, \[sp, 14\]
>> +** ...
>> +** ret
>> +*/
>> +void bfloat_mov_mr (void)
>> +{
>> + volatile bfloat16_t x;
>> + register bfloat16_t y asm ("w2");
>> + asm volatile ("#foo" : : : "memory");
>> + y = x;
>> + asm volatile ("#foo" :: "r" (y));
>> +}
>
> Similarly here:
>
> /*
> **bfloat_mov_mr:
> ** ldrh w2, \[x0\]
> ** ret
> */
> void bfloat_mov_mr (bfloat16_t *ptr)
> {
> register bfloat16_t y asm ("w2");
> y = *ptr;
> asm volatile ("#foo" :: "r" (y));
> }
>
> Same for _2.d and _3.c
Done
>
>> diff --git a/gcc/testsuite/gcc.target/aarch64/bfloat16_scalar_compile_2.c b/gcc/testsuite/gcc.target/aarch64/bfloat16_scalar_compile_2.c
>> new file mode 100644
>> index 00000000000..02656d32f14
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/bfloat16_scalar_compile_2.c
>> @@ -0,0 +1,122 @@
>> +/* { dg-do assemble { target { aarch64*-*-* } } } */
>> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
>> +/* { dg-additional-options "-march=armv8.2-a -O3 --save-temps -std=gnu90" } */
>> +/* { dg-final { check-function-bodies "**" "" } } */
>> +
>> +#pragma GCC push_options
>> +#pragma GCC target ("+bf16")
>> +
>> +#include <arm_bf16.h>
>
> This effectively tests the same thing as bfloat16_scalar_compile_1.c.
> IMO the more interesting way round is:
>
> #include <arm_bf16.h>
Yes, I changed it in the simd test but not here. Good catch!
>
> #pragma GCC push_options
> #pragma GCC target ("+bf16")
>
> like for the simd tests. So _1.c is the normal "enable before include"
> case, _2.c is "enable after include" and _3.c is "don't enable at all".
>
> Thanks,
> Richard
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BFmode1of2-rev4.patch
Type: text/x-patch
Size: 39277 bytes
Desc: BFmode1of2-rev4.patch
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20200109/85bdfbb0/attachment.bin>
More information about the Gcc-patches
mailing list