This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/43725] Poor instructions selection, scheduling and registers allocation for ARM NEON intrinsics


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43725

--- Comment #5 from Siarhei Siamashka <siarhei.siamashka at gmail dot com> 2010-10-08 14:13:08 UTC ---
(In reply to comment #3)
> On Mon, 4 Oct 2010, siarhei.siamashka at gmail dot com wrote:
> 
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43725
> > 
> > --- Comment #2 from Siarhei Siamashka <siarhei.siamashka at gmail dot com> 2010-10-04 22:59:56 UTC ---
> > (In reply to comment #1)
> > > So the compiler is correct not to be using vld1 for this code.  The memory
> > > format of int32x4_t is defined to be the format of a neon register that has
> > > been filled from an array of int32 values and then stored to memory using VSTM
> > > (or equivalent sequence).  The implication of all this is that int32x4_t does
> > > not (necessarily) have the same memory layout as int32_t[4].
> > 
> > Could you elaborate on this? Specifically about the case when memory format for
> > VSTM and VST1 may differ.
> 
> Big-endian.

OK, I see. Looks like VLDM/VSTM instructions could be replaced with VLD1/VST1
(by artificially forcing element size to 64) in almost all cases except when
SCTLR.A == 1 due to unwanted alignment traps potentially happening in this
case.

But the question is whether it is really necessary to suffer from a performance
penalty on little endian systems?

> I previously explained the issues with big-endian NEON vectors in GCC at 
> length:
> 
> http://gcc.gnu.org/ml/gcc-patches/2010-06/msg00409.html

Thanks for the link, something seems to be seriously overengineered. Looks like
you brought a problem upon yourself and now are trying to valiantly solve it.

Does (efficient) support of NEON intrinsics on big endian systems even have any
practical value? Maybe it makes sense to get a reasonable performance at least
on little endian systems first. To me it looks like you are just running after
two hares...


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]