46011 – 256bit vectorizer failed on double->int

Bug 46011 - 256bit vectorizer failed on double->int

Summary: 256bit vectorizer failed on double->int

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	4.6.0

Importance:	P3 enhancement
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:	vectorizer
	Show dependency tree / graph

Reported:	2010-10-14 08:31 UTC by H.J. Lu
Modified:	2020-09-14 12:26 UTC (History)
CC List:	2 users (show)

See Also:	46012
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2010-10-14 08:36:25

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description H.J. Lu 2010-10-14 08:31:26 UTC

For gcc.target/i386/vectorize4-avx.c, vect256 branch
generates:

.L2:
	vmovaps	-120(%rsp,%rax), %ymm0
	vcvtps2pd	%xmm0, %ymm1
	vextractf128	$0x1, %ymm0, %xmm0
	vsqrtpd	%ymm1, %ymm1
	vcvttpd2dqy	%ymm1, %xmm1
	vmovdqu	%xmm1, (%rdi,%rax)
	vcvtps2pd	%xmm0, %ymm0
	vsqrtpd	%ymm0, %ymm0
	vcvttpd2dqy	%ymm0, %xmm0
	vmovdqu	%xmm0, 16(%rdi,%rax)
	addq	$32, %rax
	cmpq	$1024, %rax
	jne	.L2

Trunk at revision 165455 generates

.L2:
	vmovaps	-120(%rsp,%rax), %xmm1
	vmovhlps	%xmm1, %xmm0, %xmm0
	vcvtps2pd	%xmm1, %xmm2
	vsqrtpd	%xmm2, %xmm2
	vcvttpd2dqx	%xmm2, %xmm2
	vcvtps2pd	%xmm0, %xmm1
	vsqrtpd	%xmm1, %xmm1
	vcvttpd2dqx	%xmm1, %xmm1
	vpunpcklqdq	%xmm1, %xmm2, %xmm1
	vmovdqu	%xmm1, (%rdi,%rax)
	addq	$16, %rax
	cmpq	$1024, %rax
	jne	.L2

Comment 1 hjl@gcc.gnu.org 2010-10-14 08:33:13 UTC

Author: hjl
Date: Thu Oct 14 08:33:09 2010
New Revision: 165457

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=165457
Log:
Scan 256bit AVX register and xfail vectorize4-avx.c.

2010-10-14  H.J. Lu  <hongjiu.lu@intel.com>

	PR middle-end/46011
	* gcc.target/i386/vectorize4-avx.c: Scan 256bit AVX register
	and xfail.

Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/i386/vectorize4-avx.c

Comment 2 Richard Biener 2010-10-14 08:36:25 UTC

Yep, that's a known limitation with the new scheme which just allows one
vector size per loop.  It needs special support in the vectorize_conversion
routine.

Comment 3 Richard Biener 2020-09-14 12:26:47 UTC

Fixed in GCC10+