[Bug target/59078] New: autoincrement feature of NEON store instructions is not used
tir5c3 at yahoo dot co.uk
gcc-bugzilla@gcc.gnu.org
Mon Nov 11 16:45:00 GMT 2013
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59078
Bug ID: 59078
Summary: autoincrement feature of NEON store instructions is
not used
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: tir5c3 at yahoo dot co.uk
The following testcase, when compiled with 'g++ test.cc -O3 -mfpu=neon
--save-temps -c', produces very inefficient code.
=====
#include <arm_neon.h>
uint64_t* foo(uint64_t* x, uint32_t y)
{
uint64x2_t d = vreinterpretq_u64_u32(vdupq_n_u32(y));
vst1q_u64(x, d);
x+=2;
vst1q_u64(x, d);
x+=2;
vst1q_u64(x, d);
x+=2;
vst1q_u64(x, d);
x+=2;
vst1q_u64(x, d);
x+=2;
vst1q_u64(x, d);
x+=2;
vst1q_u64(x, d);
x+=2;
vst1q_u64(x, d);
x+=2;
return x;
}
====
The resulting assembly:
====
_Z3fooPyj:
push {r4, r5, r6, r7}
vdup.32 q8, r1
add r7, r0, #32
add r6, r0, #48
add r5, r0, #64
add r4, r0, #80
add r1, r0, #96
add r2, r0, #112
mov r3, r0
adds r0, r0, #128
vst1.64 {d16-d17}, [r3:64]!
vst1.64 {d16-d17}, [r3:64]
vst1.64 {d16-d17}, [r7:64]
vst1.64 {d16-d17}, [r6:64]
vst1.64 {d16-d17}, [r5:64]
vst1.64 {d16-d17}, [r4:64]
vst1.64 {d16-d17}, [r1:64]
vst1.64 {d16-d17}, [r2:64]
pop {r4, r5, r6, r7}
bx lr
====
The main problem is that pointer autoincrement feature of the vst1.64
instruction is not fully utilized. GCC apparently figures it out for the first
store, but it becomes confused later. I would expect GCC to produce the
following output:
====
_Z3fooPyj:
vdup.32 q8, r1
vst1.64 {d16-d17}, [r0:64]!
vst1.64 {d16-d17}, [r0:64]!
vst1.64 {d16-d17}, [r0:64]!
vst1.64 {d16-d17}, [r0:64]!
vst1.64 {d16-d17}, [r0:64]!
vst1.64 {d16-d17}, [r0:64]!
vst1.64 {d16-d17}, [r0:64]!
vst1.64 {d16-d17}, [r0:64]!
bx lr
====
On unrolled loops GCC spills almost all registers to memory, which
causes two to three times worse performance compared to the optimal
version.
This bug has been tested on GCC 4.8.1. This email [1] suggests that mainline is
also affected.
[1]: http://gcc.gnu.org/ml/gcc-help/2013-11/msg00075.html
More information about the Gcc-bugs
mailing list