ARM NEON Intrinsics guide?
Wed May 11 09:39:00 GMT 2016
> As has been mentioned on this thread already,
> is a list of the intrinsics and how they map down to NEON instructions,
> thought it's
> more of a reference rather than a user guide.
> If you can isolate a standalone example where GCC NEON intrinsics perform
> poorly it can you
> please file a bug report with the testcase.
I hope to get something together shortly.
Here's one of the pain points:
int64x2_t c = vcombine_s64(vget_high_s64(a),vget_low_s64(b));
I'm testing alternatives at the moment... It looks like lane
extraction and insertion produces better code under GCC. It seems to
limit GCC's desire to spill out into R registers.
> As an aside, I notice your command-line options are sub-optimal.
> If you're targeting a Cortex-A7 you want to use -mfpu=neon-vfpv4 rather
> than just -mfpu=neon.
> This will give you access to the vfma instructions.
> Whereas if you're targeting ARMv8-A on a Cortex-A53 you'll want to use
> to enable the ARMv8 floating-point an NEON instructions.
Thanks, this is the sort of thing I was looking for: higher level prescriptions.
I'm also looking for something on creating new vectors on the fly from
scattered data. vcombine_s64 is a pain point under this data set, and
the suggestions here don't apply:
More information about the Gcc-help