This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/82989] [7/8 regression] Inexplicable use of NEON for 64-bit math


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82989

--- Comment #14 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to Matthijs van Duin from comment #13)
> In case it's of interest, I did a quick benchmark of my testcase executed in
> a loop on a cortex-a8:
> 
> Without neon:
>     12 instructions/iteration
>     14 cycles/iteration
> 
> With neon:
>     14 instructions/iteration
>     35.2-35.3 cycles/iteration
> 
> (This includes 4 instructions for the loop itself.)
> 
> When using neon, the majority of the time is spent in a nasty pipeline stall
> for moving data from neon registers to arm registers, which takes a minimum
> of 20 cycles according to the cortex-a8 TRM.

Yes on older cores it can be a bad idea to allow accidental use of Neon
instructions. The simplest workaround is to switch off Neon, just use
-mfpu=vfp.

We probably also need to block the register allocator from spilling integer
registers to the FP register file as that would have the same stall (another
thing it really seems to insist on for some odd reason). There are AArch64
patches for this that could be ported to Arm.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]