This is the mail archive of the
mailing list for the GCC project.
Re: ARM NEON Intrinsics guide?
- From: Jeffrey Walton <noloader at gmail dot com>
- To: Kyrill Tkachov <kyrylo dot tkachov at foss dot arm dot com>
- Cc: "gcc-help at gcc dot gnu dot org" <gcc-help at gcc dot gnu dot org>
- Date: Wed, 11 May 2016 05:38:58 -0400
- Subject: Re: ARM NEON Intrinsics guide?
- Authentication-results: sourceware.org; auth=none
- References: <CAH8yC8mz5pG_TYDCZDtyOdgRW=Ay4fRN-OXduQcAbo12rjRQog at mail dot gmail dot com> <5732F8F5 dot 8040300 at foss dot arm dot com>
- Reply-to: noloader at gmail dot com
> As has been mentioned on this thread already,
> is a list of the intrinsics and how they map down to NEON instructions,
> thought it's
> more of a reference rather than a user guide.
> If you can isolate a standalone example where GCC NEON intrinsics perform
> poorly it can you
> please file a bug report with the testcase.
I hope to get something together shortly.
Here's one of the pain points:
int64x2_t c = vcombine_s64(vget_high_s64(a),vget_low_s64(b));
I'm testing alternatives at the moment... It looks like lane
extraction and insertion produces better code under GCC. It seems to
limit GCC's desire to spill out into R registers.
> As an aside, I notice your command-line options are sub-optimal.
> If you're targeting a Cortex-A7 you want to use -mfpu=neon-vfpv4 rather
> than just -mfpu=neon.
> This will give you access to the vfma instructions.
> Whereas if you're targeting ARMv8-A on a Cortex-A53 you'll want to use
> to enable the ARMv8 floating-point an NEON instructions.
Thanks, this is the sort of thing I was looking for: higher level prescriptions.
I'm also looking for something on creating new vectors on the fly from
scattered data. vcombine_s64 is a pain point under this data set, and
the suggestions here don't apply: