This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/51492] New: vectorizer generates unnecessary code
- From: "drepper.fsp at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Sat, 10 Dec 2011 00:59:22 +0000
- Subject: [Bug tree-optimization/51492] New: vectorizer generates unnecessary code
- Auto-submitted: auto-generated
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492
Bug #: 51492
Summary: vectorizer generates unnecessary code
Classification: Unclassified
Product: gcc
Version: 4.6.2
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: drepper.fsp@gmail.com
Build: x86_64-linux
Compile this code with 4.6.2 on a x86-64 machine with -O3:
#define SIZE 65536
#define WSIZE 64
unsigned short head[SIZE] __attribute__((aligned(64)));
void
f(void)
{
for (unsigned n = 0; n < SIZE; ++n) {
unsigned short m = head[n];
head[n] = (unsigned short)(m >= WSIZE ? m-WSIZE : 0);
}
}
The result I see is this:
0000000000000000 <f>:
0: 66 0f ef d2 pxor %xmm2,%xmm2
4: b8 00 00 00 00 mov $0x0,%eax
5: R_X86_64_32 head
9: 66 0f 6f 25 00 00 00 movdqa 0x0(%rip),%xmm4 # 11 <f+0x11>
10: 00
d: R_X86_64_PC32 .LC0-0x4
11: 66 0f 6f 1d 00 00 00 movdqa 0x0(%rip),%xmm3 # 19 <f+0x19>
18: 00
15: R_X86_64_PC32 .LC1-0x4
19: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
20: 66 0f 6f 00 movdqa (%rax),%xmm0
24: 66 0f 6f c8 movdqa %xmm0,%xmm1
28: 66 0f d9 c4 psubusw %xmm4,%xmm0
2c: 66 0f 75 c2 pcmpeqw %xmm2,%xmm0
30: 66 0f fd cb paddw %xmm3,%xmm1
34: 66 0f df c1 pandn %xmm1,%xmm0
38: 66 0f 7f 00 movdqa %xmm0,(%rax)
3c: 48 83 c0 10 add $0x10,%rax
40: 48 3d 00 00 00 00 cmp $0x0,%rax
42: R_X86_64_32S head+0x20000
46: 75 d8 jne 20 <f+0x20>
48: f3 c3 repz retq
There is a lot of unnecessary code. The psubusw instruction alone is
sufficient. The purpose of this instruction is to implement saturated
subtraction. Why does gcc create all this extra code? The code should just be
movdqa (%rax), %xmm0
psubusw %xmm1, %xmm0
movdqa %mm0, (%rax)
where %xmm1 has WSIZE in the 16-bit values.