This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug c/66369] New: gcc 4.8.3/5.1.0 miss optimisation with vpmovmskb
- From: "marcus.kool at urlfilterdb dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Mon, 01 Jun 2015 21:53:44 +0000
- Subject: [Bug c/66369] New: gcc 4.8.3/5.1.0 miss optimisation with vpmovmskb
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66369
Bug ID: 66369
Summary: gcc 4.8.3/5.1.0 miss optimisation with vpmovmskb
Product: gcc
Version: 4.8.3
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: marcus.kool at urlfilterdb dot com
Target Milestone: ---
Created attachment 35672
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35672&action=edit
example C code to demonstrate the missed optimisation in gcc 4.8.3 and 5.1.0
When using _mm256_movemask_epi8() I cannot find a way for gcc to produce
vpmovmskb YMM,R64
instead of
vpmovmskb YMM,R32
When the result of the vpmovmskb is not stored in R64, unnecessary
sign-extension instructions cltq, movl or movslq are generated later. With a
result in R32 and indexing an array of structs, gcc generates for
node = node->children[ __builtin_ctzl(result-of-vpmovmskb) ]
the following:
vpmovmskb YMM,R32
movslq R32, R64
tzcntq R64, R64
movq offset(%rdi,R64,8), %rdi
instead of the more efficient:
vpmovmskb YMM,R64
tzcntq R64,R64
movq offset(%rdi,R64,8), %rdi
Attached is avx2.c which has the C source code that demonstrates the above.
aavx2.c is compiled with gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9) and flags
-std=c99 -march=core-avx2 -mtune=core-avx2 -O3
gcc 5.1.0 has the same behaviour.