Bug 46011 - 256bit vectorizer failed on double->int
Summary: 256bit vectorizer failed on double->int
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.6.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2010-10-14 08:31 UTC by H.J. Lu
Modified: 2020-09-14 12:26 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2010-10-14 08:36:25


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description H.J. Lu 2010-10-14 08:31:26 UTC
For gcc.target/i386/vectorize4-avx.c, vect256 branch
generates:

.L2:
	vmovaps	-120(%rsp,%rax), %ymm0
	vcvtps2pd	%xmm0, %ymm1
	vextractf128	$0x1, %ymm0, %xmm0
	vsqrtpd	%ymm1, %ymm1
	vcvttpd2dqy	%ymm1, %xmm1
	vmovdqu	%xmm1, (%rdi,%rax)
	vcvtps2pd	%xmm0, %ymm0
	vsqrtpd	%ymm0, %ymm0
	vcvttpd2dqy	%ymm0, %xmm0
	vmovdqu	%xmm0, 16(%rdi,%rax)
	addq	$32, %rax
	cmpq	$1024, %rax
	jne	.L2

Trunk at revision 165455 generates

.L2:
	vmovaps	-120(%rsp,%rax), %xmm1
	vmovhlps	%xmm1, %xmm0, %xmm0
	vcvtps2pd	%xmm1, %xmm2
	vsqrtpd	%xmm2, %xmm2
	vcvttpd2dqx	%xmm2, %xmm2
	vcvtps2pd	%xmm0, %xmm1
	vsqrtpd	%xmm1, %xmm1
	vcvttpd2dqx	%xmm1, %xmm1
	vpunpcklqdq	%xmm1, %xmm2, %xmm1
	vmovdqu	%xmm1, (%rdi,%rax)
	addq	$16, %rax
	cmpq	$1024, %rax
	jne	.L2
Comment 1 hjl@gcc.gnu.org 2010-10-14 08:33:13 UTC
Author: hjl
Date: Thu Oct 14 08:33:09 2010
New Revision: 165457

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=165457
Log:
Scan 256bit AVX register and xfail vectorize4-avx.c.

2010-10-14  H.J. Lu  <hongjiu.lu@intel.com>

	PR middle-end/46011
	* gcc.target/i386/vectorize4-avx.c: Scan 256bit AVX register
	and xfail.

Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/i386/vectorize4-avx.c
Comment 2 Richard Biener 2010-10-14 08:36:25 UTC
Yep, that's a known limitation with the new scheme which just allows one
vector size per loop.  It needs special support in the vectorize_conversion
routine.
Comment 3 Richard Biener 2020-09-14 12:26:47 UTC
Fixed in GCC10+