This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug rtl-optimization/59511] [4.9 Regression] FAIL: gcc.target/i386/pr36222-1.c scan-assembler-not movdqa with -mtune=corei7
- From: "peter at cordes dot ca" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 02 Jun 2016 17:37:39 +0000
- Subject: [Bug rtl-optimization/59511] [4.9 Regression] FAIL: gcc.target/i386/pr36222-1.c scan-assembler-not movdqa with -mtune=corei7
- Auto-submitted: auto-generated
- References: <bug-59511-4 at http dot gcc dot gnu dot org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59511
--- Comment #7 from Peter Cordes <peter at cordes dot ca> ---
I'm seeing the same symptom, affecting gcc4.9 through 5.3. Not present in 6.1.
IDK if the cause is the same.
(code from an improvement to the horizontal_add functions in Agner Fog's vector
class library)
#include <immintrin.h>
int hsum16_gccmovdqa (__m128i const a) {
__m128i lo = _mm_cvtepi16_epi32(a); // sign-extended
a0, a1, a2, a3
__m128i hi = _mm_unpackhi_epi64(a,a); // gcc4.9 through 5.3
wastes a movdqa on this
hi = _mm_cvtepi16_epi32(hi);
__m128i sum1 = _mm_add_epi32(lo,hi); // add
sign-extended upper / lower halves
//return horizontal_add(sum1); // manually inlined.
// Shortening the code below can avoid the movdqa
__m128i shuf = _mm_shuffle_epi32(sum1, 0xEE);
__m128i sum2 = _mm_add_epi32(shuf,sum1); // 2 sums
shuf = _mm_shufflelo_epi16(sum2, 0xEE);
__m128i sum4 = _mm_add_epi32(shuf,sum2);
return _mm_cvtsi128_si32(sum4); // 32 bit sum
}
gcc4.9 through gcc5.3 output (-O3 -mtune=generic -msse4.1):
movdqa %xmm0, %xmm1
pmovsxwd %xmm0, %xmm2
punpckhqdq %xmm0, %xmm1
pmovsxwd %xmm1, %xmm0
paddd %xmm2, %xmm0
...
gcc6.1 output:
pmovsxwd %xmm0, %xmm1
punpckhqdq %xmm0, %xmm0
pmovsxwd %xmm0, %xmm0
paddd %xmm0, %xmm1
...
In a more complicated case, when inlining this code or not, there's actually a
difference between gcc 4.9 and 5.x: gcc5 has the extra movdqa in more cases.
See my attachment, copied from https://godbolt.org/g/e8iQsj