This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/52568] New: suboptimal __builtin_shuffle on cycles with AVX
- From: "marc.glisse at normalesup dot org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Mon, 12 Mar 2012 19:00:22 +0000
- Subject: [Bug target/52568] New: suboptimal __builtin_shuffle on cycles with AVX
- Auto-submitted: auto-generated
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52568
Bug #: 52568
Summary: suboptimal __builtin_shuffle on cycles with AVX
Classification: Unclassified
Product: gcc
Version: 4.7.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: marc.glisse@normalesup.org
Hello,
I compiled the following with -O3 (or -Os) and -mavx
#include <x86intrin.h>
__m256d left(__m256d x){
__m256i mask={1,2,3,0};
return __builtin_shuffle(x,mask);
}
(by the way, for some reason, gcc insists that 'mask' is set but not used with
-Wall)
and got:
vunpckhpd %xmm0, %xmm0, %xmm3
vmovapd %xmm0, %xmm1
vextractf128 $0x1, %ymm0, %xmm0
vmovaps %xmm0, %xmm2
vunpckhpd %xmm0, %xmm0, %xmm0
vunpcklpd %xmm1, %xmm0, %xmm1
vunpcklpd %xmm2, %xmm3, %xmm0
vinsertf128 $0x1, %xmm1, %ymm0, %ymm0
ret
That doesn't really match the code I currently use to do this:
#ifdef __AVX2__
__m256d d=_mm256_permute4x64_pd(x,1+2*4+3*16+0*64);
#else
__m256d b=_mm256_shuffle_pd(x,x,5);
__m256d c=_mm256_permute2f128_pd(b,b,1);
__m256d d=_mm256_blend_pd(b,c,10);
#endif
Could something recognizing this permutation pattern (and the right cyclic
shift) be added? I know there are too many shuffles to hand-code them all, but
cycles seem like they shouldn't be too uncommon.
With -mavx2, I get a single vpermq, which is close enough to the expected
vpermpd.