This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PING^3] [PATCH] [AArch64, NEON] Improve vmulX intrinsics
- From: James Greenhalgh <james dot greenhalgh at arm dot com>
- To: Jiangjiji <jiangjiji at huawei dot com>
- Cc: Kyrylo Tkachov <Kyrylo dot Tkachov at arm dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, Richard Earnshaw <Richard dot Earnshaw at arm dot com>, "Yangfei (Felix)" <felix dot yang at huawei dot com>, Marcus Shawcroft <Marcus dot Shawcroft at arm dot com>
- Date: Tue, 16 Jun 2015 10:05:29 +0100
- Subject: Re: [PING^3] [PATCH] [AArch64, NEON] Improve vmulX intrinsics
- Authentication-results: sourceware.org; auth=none
- References: <B34C25384B9D7A428FF276D3AE7D6BDC7B4FE913 at nkgeml511-mbx dot china dot huawei dot com> <55017C2F dot 6080302 at arm dot com> <B34C25384B9D7A428FF276D3AE7D6BDC7B50CDC6 at nkgeml511-mbx dot china dot huawei dot com> <20150505063715 dot GA31920 at arm dot com> <B34C25384B9D7A428FF276D3AE7D6BDC7B510C98 at nkgeml511-mbx dot china dot huawei dot com>
On Tue, May 05, 2015 at 02:14:16PM +0100, Jiangjiji wrote:
> Hi Jamesï
>
> Thanks for your comment.
>
> Seems we need a 'dup' before 'fmul' if we use the GCC vector extension syntax way.
>
> Example:
> dup v1.2s, v1.s[0]
> fmul v0.2s, v1.2s, v0.2s
>
> And we need another pattern to combine this two insns into 'fmul
> %0.2s,%1.2s,%2.s[0]', which is kind of complex.
This would be the best way to handle the intrinsic - then we can
potentially make a nice optimization of lots of additional code. I
think it would be best if we add that extra pattern to combine the two
instructions, then write the intrinsic as __a * __b.
Looking at the pattern you propose to add for vmul_n_f32, I'm not sure
why this does not currently work, maybe it is just as simple as swapping
the operand order in:
+(define_insn "aarch64_mul_n<mode>"
+ [(set (match_operand:VMUL 0 "register_operand" "=w")
+ (mult:VMUL
+ (match_operand:VMUL 1 "register_operand" "w")
+ (vec_duplicate:VMUL
+ (match_operand:<VEL> 2 "register_operand" "<h_con>"))))]
+ "TARGET_SIMD"
+ "<f>mul\t%0.<Vtype>, %1.<Vtype>, %2.<Vetype>[0]"
+ [(set_attr "type" "neon_mul_<Vetype>_long")]
+)
+
To look like:
+(define_insn "aarch64_mul_n<mode>"
+ [(set (match_operand:VMUL 0 "register_operand" "=w")
+ (mult:VMUL
+ (vec_duplicate:VMUL
+ (match_operand:<VEL> 2 "register_operand" "<h_con>"))
+ (match_operand:VMUL 1 "register_operand" "w")))]
+ "TARGET_SIMD"
+ "<f>mul\t%0.<Vtype>, %1.<Vtype>, %2.<Vetype>[0]"
+ [(set_attr "type" "neon_mul_<Vetype>_long")]
+)
+
Which I think better matches the canonicalization rules for combine?
> BTW: maybe it's better to reconsider this issue after this patch, right?
Maybe, but I'd rather see this done now rather than later. Otherwise,
we have to add and remove the
If you want to split your patch in to two parts, one part for the
vmul_n_* intrinsics, and one part for the other intrinsics, then I'd be
happy to review them as two separate patches.
Thanks,
James
> On Sat, Apr 11, 2015 at 11:37:47AM +0100, Jiangjiji wrote:
> > Hi,
> > This is a ping for: https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00772.html
> > Regtested with aarch64-linux-gnu on QEMU.
> > This patch has no regressions for aarch64_be-linux-gnu big-endian target too.
> > OK for the trunk?
> >
> > Thanks.
> > Jiang jiji
> >
> >
> > ----------
> > Re: [PING^2] [PATCH] [AArch64, NEON] Improve vmulX intrinsics
> >
> > Hi, Kyrill
> > Thank you for your suggestion.
> > I fixed it and regtested with aarch64-linux-gnu on QEMU.
> > This patch has no regressions for aarch64_be-linux-gnu big-endian target too.
> > OK for the trunk?
>
> Hi Jiang,
>
> I'm sorry that I've taken so long to get to this, I've been out of office for several weeks. I have one comment.
>
> > +__extension__ static __inline float32x2_t __attribute__
> > +((__always_inline__))
> > +vmul_n_f32 (float32x2_t __a, float32_t __b) {
> > + return __builtin_aarch64_mul_nv2sf (__a, __b); }
> > +
>
> For vmul_n_* intrinsics, is there a reason we don't want to use the GCC vector extension syntax to allow us to write these as:
>
> __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
> vmul_n_f32 (float32x2_t __a, float32_t __b)
> {
> return __a * __b;
> }
>
> It would be great if we could make that work.
>
> Thanks,
> James
>