This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, ARM] 64-bit shifts in NEON


On 20/09/12 16:49, Ulrich Weigand wrote:
> Richard Earnshaw wrote:
> 
>> Hmm, this is going to cause bottlenecks on Cortex-A15: writing a Neon
>> single-precision register and then reading it back as a double-precision
>> value will cause scheduling problems.
> 
> Ok, that is a problem ...
> 
>> The awkward thing here is that the shift only uses the bottom 8 bits of
>> the register, even though the instruction takes a 64-bit register, so we
>> don't want to go to the trouble of sign-extending the value all the way
>> out to 64-bits.
> 
> We don't really care what the upper bits are set to.  Would a
>   vdup.32 Dn, Rm
> (instead of the vmov) help here, or does this likewise have
> performance issues?
>  
>> A solution to this is to have the set of the shifter register done as a
>> lane-set operation rather than as a set of the lower register, but it
>> probably needs some thought as to how to achieve this without creating
>> other overheads.
> 
> What instruction are you refering to here?  Loads from memory?

Yes, if that's the source, or if from another register, something like

	vmov.32 Dd[0], Rt

(it doesn't matter that the other lane remains unintialized).  This has
the advantage that it doesn't clobber the other half of the register.

If the operand is already known to be in an S-register, then
vdup(scalar) can be used, but of course that needs a full 64-bit target
register.

R.




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]