[PATCH 9/9][GCC][Arm] Add ACLE intrinsics for complex mutliplication and addition

Fri Jan 11 10:37:00 GMT 2019

Hi Christoph,

The arm one is a testism, I have a validated patch that I will commit soon.
The aarch64 one is a big-endian lane ordering issue, I had completely forgotten to test big-endian,
Patch for that is going through validation now.

Will submit the aarch64 one soon, sorry for the mess, splitting of the patches from the remainder of the series
had some casualties.. These should be the last.

Thanks and happy new years to you too!

Kind Regards,
Tamar

-----Original Message-----
From: Christophe Lyon <christophe.lyon@linaro.org> 
Sent: Friday, January 11, 2019 10:02 AM
To: Tamar Christina <Tamar.Christina@arm.com>
Cc: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>; gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>; nickc@redhat.com
Subject: Re: [PATCH 9/9][GCC][Arm] Add ACLE intrinsics for complex mutliplication and addition

Hi Tamar,

On Thu, 10 Jan 2019 at 16:41, Tamar Christina <Tamar.Christina@arm.com> wrote:
>
> Hi Christoph,
>
> It was introduced in a small refactoring after which I only retested the testcases I added,which don't trigger the issue.
>
> In any case it's a trivial fix and I'll submit a patch in a bit.
>
> Tamar
>
> ________________________________________
> From: Christophe Lyon <christophe.lyon@linaro.org>
> Sent: Thursday, January 10, 2019 3:35:18 PM
> To: Tamar Christina
> Cc: Kyrill Tkachov; gcc-patches@gcc.gnu.org; nd; Ramana Radhakrishnan; 
> Richard Earnshaw; nickc@redhat.com
> Subject: Re: [PATCH 9/9][GCC][Arm] Add ACLE intrinsics for complex 
> mutliplication and addition
>
> Hi Tamar,
>
>
> On Thu, 10 Jan 2019 at 04:44, Tamar Christina <Tamar.Christina@arm.com> wrote:
> >
> > Hi Kyrill,
> >
> > Committed with a the addition of a few trivial defines and iterators 
> > that were missing due to The patch being split.
> >
> > Thanks,
> > Tamar
> >
> > -----Original Message-----
> > From: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>
> > Sent: Friday, December 21, 2018 11:40 AM
> > To: Tamar Christina <Tamar.Christina@arm.com>; 
> > gcc-patches@gcc.gnu.org
> > Cc: nd <nd@arm.com>; Ramana Radhakrishnan 
> > <Ramana.Radhakrishnan@arm.com>; Richard Earnshaw 
> > <Richard.Earnshaw@arm.com>; nickc@redhat.com
> > Subject: Re: [PATCH 9/9][GCC][Arm] Add ACLE intrinsics for complex 
> > mutliplication and addition
> >
> > Hi Tamar,
> >
> > On 11/12/18 15:46, Tamar Christina wrote:
> > > Hi All,
> > >
> > > This patch adds NEON intrinsics and tests for the Armv8.3-a 
> > > complex multiplication and add instructions with a rotate along the Argand plane.
> > >
> > > The instructions are documented in the ArmARM[1] and the 
> > > intrinsics specification will be published on the Arm website [2].
> > >
> > > The Lane versions of these instructions are special in that they always select a pair.
> > > using index 0 means selecting lane 0 and 1.  Because of this the 
> > > range check for the intrinsics require special handling.
> > >
> > > On Arm, in order to implement some of the lane intrinsics we're 
> > > using the structure of the register file.  The lane variant of 
> > > these instructions always select a D register, but the data itself 
> > > can be stored in Q registers.  This means that for single 
> > > precision complex numbers you are only allowed to select D[0] but using the register file layout you can get the range 0-1 for lane indices by selecting between Dn[0] and Dn+1[0].
> > >
> > > Same reasoning applies for half float complex numbers, except 
> > > there your D register indexes can be 0 or 1, so you have a total range of 4 elements (for a V8HF).
> > >
> > >
> > > [1]
> > > https://developer.arm.com/docs/ddi0487/latest/arm-architecture-ref
> > > eren ce-manual-armv8-for-armv8-a-architecture-profile
> > > [2] https://developer.arm.com/docs/101028/latest
> > >
> > > Bootstrapped Regtested on arm-none-gnueabihf and no issues.
> > >
> > > Ok for trunk?
> > >
> >
> > Ok.
> > Thanks,
> > Kyrill
> >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > 2018-12-11  Tamar Christina  <tamar.christina@arm.com>
> > >
> > >         * config/arm/arm-builtins.c
> > >         (enum arm_type_qualifiers): Add qualifier_lane_pair_index.
> > >         (MAC_LANE_PAIR_QUALIFIERS): New.
> > >         (arm_expand_builtin_args): Use it.
> > >         (arm_expand_builtin_1): Likewise.
> > >         * config/arm/arm-protos.h (neon_vcmla_lane_prepare_operands): New.
> > >         * config/arm/arm.c (neon_vcmla_lane_prepare_operands): New.
> > >         * config/arm/arm-c.c (arm_cpu_builtins): Add __ARM_FEATURE_COMPLEX.
> > >         * config/arm/arm_neon.h:
> > >         (vcadd_rot90_f16): New.
> > >         (vcaddq_rot90_f16): New.
> > >         (vcadd_rot270_f16): New.
> > >         (vcaddq_rot270_f16): New.
> > >         (vcmla_f16): New.
> > >         (vcmlaq_f16): New.
> > >         (vcmla_lane_f16): New.
> > >         (vcmla_laneq_f16): New.
> > >         (vcmlaq_lane_f16): New.
> > >         (vcmlaq_laneq_f16): New.
> > >         (vcmla_rot90_f16): New.
> > >         (vcmlaq_rot90_f16): New.
> > >         (vcmla_rot90_lane_f16): New.
> > >         (vcmla_rot90_laneq_f16): New.
> > >         (vcmlaq_rot90_lane_f16): New.
> > >         (vcmlaq_rot90_laneq_f16): New.
> > >         (vcmla_rot180_f16): New.
> > >         (vcmlaq_rot180_f16): New.
> > >         (vcmla_rot180_lane_f16): New.
> > >         (vcmla_rot180_laneq_f16): New.
> > >         (vcmlaq_rot180_lane_f16): New.
> > >         (vcmlaq_rot180_laneq_f16): New.
> > >         (vcmla_rot270_f16): New.
> > >         (vcmlaq_rot270_f16): New.
> > >         (vcmla_rot270_lane_f16): New.
> > >         (vcmla_rot270_laneq_f16): New.
> > >         (vcmlaq_rot270_lane_f16): New.
> > >         (vcmlaq_rot270_laneq_f16): New.
> > >         (vcadd_rot90_f32): New.
> > >         (vcaddq_rot90_f32): New.
> > >         (vcadd_rot270_f32): New.
> > >         (vcaddq_rot270_f32): New.
> > >         (vcmla_f32): New.
> > >         (vcmlaq_f32): New.
> > >         (vcmla_lane_f32): New.
> > >         (vcmla_laneq_f32): New.
> > >         (vcmlaq_lane_f32): New.
> > >         (vcmlaq_laneq_f32): New.
> > >         (vcmla_rot90_f32): New.
> > >         (vcmlaq_rot90_f32): New.
> > >         (vcmla_rot90_lane_f32): New.
> > >         (vcmla_rot90_laneq_f32): New.
> > >         (vcmlaq_rot90_lane_f32): New.
> > >         (vcmlaq_rot90_laneq_f32): New.
> > >         (vcmla_rot180_f32): New.
> > >         (vcmlaq_rot180_f32): New.
> > >         (vcmla_rot180_lane_f32): New.
> > >         (vcmla_rot180_laneq_f32): New.
> > >         (vcmlaq_rot180_lane_f32): New.
> > >         (vcmlaq_rot180_laneq_f32): New.
> > >         (vcmla_rot270_f32): New.
> > >         (vcmlaq_rot270_f32): New.
> > >         (vcmla_rot270_lane_f32): New.
> > >         (vcmla_rot270_laneq_f32): New.
> > >         (vcmlaq_rot270_lane_f32): New.
> > >         (vcmlaq_rot270_laneq_f32): New.
> > >         * config/arm/arm_neon_builtins.def (vcadd90, vcadd270, vcmla0, vcmla90,
> > >         vcmla180, vcmla270, vcmla_lane0, vcmla_lane90, vcmla_lane180, vcmla_lane270,
> > >         vcmla_laneq0, vcmla_laneq90, vcmla_laneq180, vcmla_laneq270,
> > >         vcmlaq_lane0, vcmlaq_lane90, vcmlaq_lane180, vcmlaq_lane270): New.
> > >         * config/arm/neon.md (neon_vcmla_lane<rot><mode>,
> > >         neon_vcmla_laneq<rot><mode>, neon_vcmlaq_lane<rot><mode>): New.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > 2018-12-11  Tamar Christina  <tamar.christina@arm.com>
> > >
> > >         * gcc.target/aarch64/advsimd-intrinsics/vector-complex.c: Add AArch32 regexpr.
> > >         * gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c: Likewise.
> > >
> > > --
> >
>
> Since r267796, I've noticed a regression on aarch64:
> FAIL: gcc.target/aarch64/pr68674.c (test for excess errors) Excess 
> errors:
> /home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33361:10:
> error: incompatible types when returning type 'int' but 'float16x4_t'
> was expected
> /home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33385:10:
> error: incompatible types when returning type 'int' but 'float16x4_t'
> was expected
> /home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33423:10:
> error: incompatible types when returning type 'int' but 'float16x4_t'
> was expected
> /home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33477:10:
> error: incompatible types when returning type 'int' but 'float16x4_t'
> was expected
> /home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33595:10:
> error: incompatible types when returning type 'int' but 'float32x2_t'
> was expected
> /home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33648:10:
> error: incompatible types when returning type 'int' but 'float32x2_t'
> was expected
> /home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33701:10:
> error: incompatible types when returning type 'int' but 'float32x2_t'
> was expected
> /home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33754:10:
> error: incompatible types when returning type 'int' but 'float32x2_t'
> was expected
>
> I'm surprised you didn't see this during validations?

I've noticed other problems on arm-none-linux-gnueabihf:
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
 (test for excess errors)
Excess errors:
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/gcc/include/arm_neon.h:18323:10:
error: this builtin is not supported for this target [....] The testcase is compiled with -mfp16-format=ieee -march=armv8.3-a -O2
-march=armv8.3-a+fp16

In addition, guess what, some scan-assembler-times directives fail on big-endian.....
on armeb-none-linux-gnueabihf :
gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0  :
vcmla.f32\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\], #0 found 1 times
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0
scan-assembler-times vcmla.f32\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\], #0
2
gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0  :
vcmla.f32\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\], #180 found 1 times
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0
scan-assembler-times vcmla.f32\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\],
#180 2
gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0  :
vcmla.f32\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\], #270 found 1 times
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0
scan-assembler-times vcmla.f32\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\],
#270 2
gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0  :
vcmla.f32\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\], #90 found 1 times
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0
scan-assembler-times vcmla.f32\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\], #90
2
gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0  :
vcmla.f32\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\], #0 found 0 times
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0
scan-assembler-times vcmla.f32\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\], #0
2
gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0  :
vcmla.f32\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\], #180 found 0 times
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0
scan-assembler-times vcmla.f32\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\],
#180 2
gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0  :
vcmla.f32\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\], #270 found 0 times
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0
scan-assembler-times vcmla.f32\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\],
#270 2
gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0  :
vcmla.f32\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\], #90 found 0 times
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0
scan-assembler-times vcmla.f32\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\], #90
2

On aarch64_be, I'm see ICEs:
/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex.c:
In function 'test_vcmla_laneq_f32':
/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex.c:78:1:
internal compiler error: Segmentation fault 0xc3967f crash_signal
        /gcc/toplev.c:326
0xa70718 mark_jump_label_1
        /gcc/jump.c:1087
0xa707fb mark_jump_label_1
        /gcc/jump.c:1212
0xa707fb mark_jump_label_1
        /gcc/jump.c:1212
0xa70c62 mark_all_labels
        /gcc/jump.c:332
0xa70c62 rebuild_jump_labels_1
        /gcc/jump.c:74
0x78c6af execute
        /gcc/cfgexpand.c:6549
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

and similar for vector-complex_f16.c

Maybe you've already fixed this later in the series?

Happy new year :)

Christophe