This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/43725] Poor instructions selection, scheduling and registers allocation for ARM NEON intrinsics


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43725

--- Comment #4 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> 2010-10-05 07:16:35 UTC ---
(In reply to comment #2)
> (In reply to comment #1)
> > So the compiler is correct not to be using vld1 for this code.  The memory
> > format of int32x4_t is defined to be the format of a neon register that has
> > been filled from an array of int32 values and then stored to memory using VSTM
> > (or equivalent sequence).  The implication of all this is that int32x4_t does
> > not (necessarily) have the same memory layout as int32_t[4].
> 
> Could you elaborate on this? Specifically about the case when memory format for
> VSTM and VST1 may differ.
> 
> I thought that VST1 instruction could be always used as a replacement for VSTM,
> it is just a little bit less convenient in some cases because it is lacking
> some more advanced addressing modes. Moreover, VSTM is VFP instruction and VST1
> is NEON one. So I guess mixing VSTM with true NEON instructions may be
> additionally a bad idea (for performance reasons on Cortex-A9 or other
> processors?).

The ARM ARM states that VLDM / VSTM and VLDR / VSTR for 64 bit values are
compliant with VFPv2 / VFPv3 and advanced SIMD i.e. they can be executed by
both the units . Thus there should be no performance regressions on the A9
AFAIK for VLDM and VSTM / VLDR and VSTR of 64 bit registers interleaved with
other Neon instructions. 


cheers
Ramana


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]