This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/60575] New: inefficient vectorization of compare into bytes on amd64
- From: "jtaylor.debian at googlemail dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Tue, 18 Mar 2014 21:40:54 +0000
- Subject: [Bug tree-optimization/60575] New: inefficient vectorization of compare into bytes on amd64
- Auto-submitted: auto-generated
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60575
Bug ID: 60575
Summary: inefficient vectorization of compare into bytes on
amd64
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: jtaylor.debian at googlemail dot com
this code comparing shorts into chars:
void __attribute__((optimize("O3"))) f(char * a_, short * b_)
{
char * restrict a = __builtin_assume_aligned(a_, 16);
short * restrict b = __builtin_assume_aligned(b_, 16);
for (int i = 0; i < 1024; i++) {
a[i] = 6 < b[i];
}
}
vectorizes with gcc 4.8.2 (gcc file.c -c -std=c99) too:
22: movdqa (%rsi,%rax,2),%xmm0
27: movdqa 0x10(%rsi,%rax,2),%xmm1
2d: pcmpgtw %xmm4,%xmm0
31: pcmpgtw %xmm4,%xmm1
35: pand %xmm3,%xmm0
39: pand %xmm3,%xmm1
3d: movdqa %xmm0,%xmm2
41: punpcklbw %xmm1,%xmm0
45: punpckhbw %xmm1,%xmm2
49: movdqa %xmm0,%xmm1
4d: punpcklbw %xmm2,%xmm0
51: punpckhbw %xmm2,%xmm1
55: movdqa %xmm0,%xmm2
59: punpcklbw %xmm1,%xmm0
5d: punpckhbw %xmm1,%xmm2
61: punpcklbw %xmm2,%xmm0
65: movdqa %xmm0,(%rdi,%rax,1)
6a: add $0x10,%rax
6e: cmp $0x400,%rax
74: jne 22 <f+0x22>
which is relatively inefficient compared to using pack instructions which would
look about like this (unrolled twice):
b3: movdqa (%rsi,%rax,2),%xmm1
b8: movdqa 0x10(%rsi,%rax,2),%xmm0
be: pcmpgtw %xmm2,%xmm1
c2: pcmpgtw %xmm2,%xmm0
c6: packsswb %xmm0,%xmm1
ca: pand %xmm3,%xmm1
ce: movdqa %xmm1,(%rdi,%rax,1)
d3: add $0x10,%rax
d7: cmp $0x400,%rax
dd: jne b3 <g+0x16>
this can also be applied to larger sizes including floating point by adding
more packs.