For gcc.target/i386/vectorize4-avx.c, vect256 branch generates: .L2: vmovaps -120(%rsp,%rax), %ymm0 vcvtps2pd %xmm0, %ymm1 vextractf128 $0x1, %ymm0, %xmm0 vsqrtpd %ymm1, %ymm1 vcvttpd2dqy %ymm1, %xmm1 vmovdqu %xmm1, (%rdi,%rax) vcvtps2pd %xmm0, %ymm0 vsqrtpd %ymm0, %ymm0 vcvttpd2dqy %ymm0, %xmm0 vmovdqu %xmm0, 16(%rdi,%rax) addq $32, %rax cmpq $1024, %rax jne .L2 Trunk at revision 165455 generates .L2: vmovaps -120(%rsp,%rax), %xmm1 vmovhlps %xmm1, %xmm0, %xmm0 vcvtps2pd %xmm1, %xmm2 vsqrtpd %xmm2, %xmm2 vcvttpd2dqx %xmm2, %xmm2 vcvtps2pd %xmm0, %xmm1 vsqrtpd %xmm1, %xmm1 vcvttpd2dqx %xmm1, %xmm1 vpunpcklqdq %xmm1, %xmm2, %xmm1 vmovdqu %xmm1, (%rdi,%rax) addq $16, %rax cmpq $1024, %rax jne .L2
Author: hjl Date: Thu Oct 14 08:33:09 2010 New Revision: 165457 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=165457 Log: Scan 256bit AVX register and xfail vectorize4-avx.c. 2010-10-14 H.J. Lu <hongjiu.lu@intel.com> PR middle-end/46011 * gcc.target/i386/vectorize4-avx.c: Scan 256bit AVX register and xfail. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/i386/vectorize4-avx.c
Yep, that's a known limitation with the new scheme which just allows one vector size per loop. It needs special support in the vectorize_conversion routine.
Fixed in GCC10+