This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PING^3] [PATCH] [AArch64, NEON] Improve vmulX intrinsics


On Tue, May 05, 2015 at 02:14:16PM +0100, Jiangjiji wrote:
> Hi Jamesï
> 
> Thanks for your comment.
> 
> Seems we need a 'dup' before 'fmul' if we use the GCC vector extension syntax way.
> 
> Example:
>         dup     v1.2s, v1.s[0]
>         fmul    v0.2s, v1.2s, v0.2s
> 
> And we need another pattern to combine this two insns into 'fmul
> %0.2s,%1.2s,%2.s[0]', which is kind of complex.

This would be the best way to handle the intrinsic - then we can
potentially make a nice optimization of lots of additional code. I
think it would be best if we add that extra pattern to combine the two
instructions, then write the intrinsic as __a * __b.

Looking at the pattern you propose to add for vmul_n_f32, I'm not sure
why this does not currently work, maybe it is just as simple as swapping
the operand order in:

+(define_insn "aarch64_mul_n<mode>"
+  [(set (match_operand:VMUL 0 "register_operand" "=w")
+        (mult:VMUL
+          (match_operand:VMUL 1 "register_operand" "w")
+          (vec_duplicate:VMUL
+            (match_operand:<VEL> 2 "register_operand" "<h_con>"))))]
+  "TARGET_SIMD"
+  "<f>mul\t%0.<Vtype>, %1.<Vtype>, %2.<Vetype>[0]"
+  [(set_attr "type" "neon_mul_<Vetype>_long")]
+)
+

To look like:

+(define_insn "aarch64_mul_n<mode>"
+  [(set (match_operand:VMUL 0 "register_operand" "=w")
+        (mult:VMUL
+          (vec_duplicate:VMUL
+            (match_operand:<VEL> 2 "register_operand" "<h_con>"))
+          (match_operand:VMUL 1 "register_operand" "w")))]
+  "TARGET_SIMD"
+  "<f>mul\t%0.<Vtype>, %1.<Vtype>, %2.<Vetype>[0]"
+  [(set_attr "type" "neon_mul_<Vetype>_long")]
+)
+

Which I think better matches the canonicalization rules for combine?


> BTW: maybe it's better to reconsider this issue after this patch, right?

Maybe, but I'd rather see this done now rather than later. Otherwise,
we have to add and remove the 

If you want to split your patch in to two parts, one part for the
vmul_n_* intrinsics, and one part for the other intrinsics, then I'd be
happy to review them as two separate patches.

Thanks,
James

> On Sat, Apr 11, 2015 at 11:37:47AM +0100, Jiangjiji wrote:
> > Hi,
> >   This is a ping for: https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00772.html
> >   Regtested with aarch64-linux-gnu on QEMU.
> >   This patch has no regressions for aarch64_be-linux-gnu big-endian target too.
> >   OK for the trunk?
> > 
> > Thanks.
> > Jiang jiji
> > 
> > 
> > ----------
> > Re: [PING^2] [PATCH] [AArch64, NEON] Improve vmulX intrinsics
> > 
> > Hi, Kyrill
> >   Thank you for your suggestion.
> >   I fixed it and regtested with aarch64-linux-gnu on QEMU.
> >   This patch has no regressions for aarch64_be-linux-gnu big-endian target too.
> >   OK for the trunk?
> 
> Hi Jiang,
> 
> I'm sorry that I've taken so long to get to this, I've been out of office for several weeks. I have one comment.
> 
> > +__extension__ static __inline float32x2_t __attribute__ 
> > +((__always_inline__))
> > +vmul_n_f32 (float32x2_t __a, float32_t __b) {
> > +  return __builtin_aarch64_mul_nv2sf (__a, __b); }
> > +
> 
> For vmul_n_* intrinsics, is there a reason we don't want to use the GCC vector extension syntax to allow us to write these as:
> 
>   __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
>   vmul_n_f32 (float32x2_t __a, float32_t __b)
>   {
>     return __a * __b;
>   }
> 
> It would be great if we could make that work.
> 
> Thanks,
> James
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]