[Bug c/49362] New: Arm Neon intrinsic types not correctly interpreted by compiler.
mark.pupilli at dyson dot com
gcc-bugzilla@gcc.gnu.org
Fri Jun 10 11:35:00 GMT 2011
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49362
Summary: Arm Neon intrinsic types not correctly interpreted by
compiler.
Product: gcc
Version: 4.4.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: mark.pupilli@dyson.com
Created attachment 24485
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24485
C-file with 2 funs that show the bug when compiled.
Arm neon intrinsics define the type uint32x4x2_t as
typedef struct uint32x4x2_t { uint32x4_t val[2]; };
This is interpreted by the compiler literally as a struct. This should not be
the case. The compiler should treat it as a pair of registers, just as it
treats uint32_t as a single register and not an array of 4 x uint32_t.
The attached c file contains two version of the same function - one that uses
quad word loads (vld1q), and one that uses double quad word loads (vld2q). The
function thats uses double quad word loads should take 2 instructions fewer but
it is actually 44 instructions long compared to 19 for the vld1q version. (Both
functions compute the same results).
I believe this bug arises because the compiler treats the following as array
access instead of a reference into the register file:
uint32x4x2_t A = vld2q_u32 ( a );
A.val[0]; // This statement should be treated as a reference to a register -
not an array access!
Assembly for vld2q version - hopefully I am not mistaken as I am new to ARM
assembly but it appears to do double quad word loads in Neon pipeline, then
transfers the registers back to the ARM processor, indexes them as arrays and
then reloads them into the Neon pipeline again!:
vld2q variant, 44 instructions:
00000014 <_ZN4Neon16hamming_distanceEPjS0_>:
14: e92d0070 push {r4, r5, r6}
18: e24dd084 sub sp, sp, #132 ; 0x84
1c: f460c38f vld2.32 {d28-d31}, [r0]
20: e28d6020 add r6, sp, #32
24: ecc6cb08 vstmia r6, {d28-d31}
28: e1a0c001 mov ip, r1
2c: e8b6000f ldm r6!, {r0, r1, r2, r3}
30: e28d4060 add r4, sp, #96 ; 0x60
34: e1a05004 mov r5, r4
38: f46c038f vld2.32 {d16-d19}, [ip]
3c: e8a5000f stmia r5!, {r0, r1, r2, r3}
40: eccd0b08 vstmia sp, {d16-d19}
44: e896000f ldm r6, {r0, r1, r2, r3}
48: e1a0c00d mov ip, sp
4c: e28d4040 add r4, sp, #64 ; 0x40
50: e885000f stm r5, {r0, r1, r2, r3}
54: e1a05004 mov r5, r4
58: e8bc000f ldm ip!, {r0, r1, r2, r3}
5c: e8a5000f stmia r5!, {r0, r1, r2, r3}
60: e89c000f ldm ip, {r0, r1, r2, r3}
64: e885000f stm r5, {r0, r1, r2, r3}
68: eddd4b10 vldr d20, [sp, #64] ; 0x40
6c: eddd5b12 vldr d21, [sp, #72] ; 0x48
70: edddab18 vldr d26, [sp, #96] ; 0x60
74: edddbb1a vldr d27, [sp, #104] ; 0x68
78: f34a61f4 veor q11, q13, q10
7c: eddd8b14 vldr d24, [sp, #80] ; 0x50
80: eddd9b16 vldr d25, [sp, #88] ; 0x58
84: eddd4b1c vldr d20, [sp, #112] ; 0x70
88: eddd5b1e vldr d21, [sp, #120] ; 0x78
8c: f30461f8 veor q3, q10, q12
90: f3f00546 vcnt.8 q8, q3
94: f3b04566 vcnt.8 q2, q11
98: f2042860 vadd.i8 q1, q2, q8
9c: f3f022c2 vpaddl.u8 q9, q1
a0: f3f422e2 vpaddl.u16 q9, q9
a4: f22201b2 vorr d0, d18, d18
a8: f26321b3 vorr d18, d19, d19
ac: f2620b90 vpadd.i32 d16, d18, d0
b0: f2600bb0 vpadd.i32 d16, d16, d16
b4: ee100b90 vmov.32 r0, d16[0]
b8: e28dd084 add sp, sp, #132 ; 0x84
bc: e8bd0070 pop {r4, r5, r6}
c0: e12fff1e bx lr
vld1q variant, only 19 instructions:
00000014 <_ZN4Neon16hamming_distanceEPjS0_>:
14: e2802010 add r2, r0, #16
18: e2813010 add r3, r1, #16
1c: f4606a8f vld1.32 {d22-d23}, [r0]
20: f4624a8f vld1.32 {d20-d21}, [r2]
24: f463aa8f vld1.32 {d26-d27}, [r3]
28: f461ca8f vld1.32 {d28-d29}, [r1]
2c: f34681fc veor q12, q11, q14
30: f30461fa veor q3, q10, q13
34: f3f00546 vcnt.8 q8, q3
38: f3b04568 vcnt.8 q2, q12
3c: f2042860 vadd.i8 q1, q2, q8
40: f3f022c2 vpaddl.u8 q9, q1
44: f3f422e2 vpaddl.u16 q9, q9
48: f22201b2 vorr d0, d18, d18
4c: f26321b3 vorr d18, d19, d19
50: f2620b90 vpadd.i32 d16, d18, d0
54: f2600bb0 vpadd.i32 d16, d16, d16
58: ee100b90 vmov.32 r0, d16[0]
5c: e12fff1e bx lr
More information about the Gcc-bugs
mailing list