Bug 53759 - [4.7/4.8 Regression] gcc -mavx emits vshufps for __builtin_ia32_loadlps
Summary: [4.7/4.8 Regression] gcc -mavx emits vshufps for __builtin_ia32_loadlps
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.7.0
: P3 normal
Target Milestone: 4.7.2
Assignee: Jakub Jelinek
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-06-24 12:36 UTC by Dag Lem
Modified: 2012-06-25 15:05 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2012-06-24 00:00:00


Attachments
gcc48-pr53759.patch (656 bytes, patch)
2012-06-25 08:48 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dag Lem 2012-06-24 12:36:17 UTC

    
Comment 1 Dag Lem 2012-06-24 12:45:55 UTC
Test code as follows:
------------------------
typedef float v4sf __attribute__ ((vector_size (4*4)));
typedef float v2sf __attribute__ ((vector_size (4*2)));

v2sf mem[1];

int main()
{
  v4sf reg = (v4sf){0,0,0,0};
  reg = __builtin_ia32_loadlps(reg, mem);
  return reg[0];
}
------------------------

With -msse, gcc emits the following code:

	xorps	%xmm0, %xmm0
	movlps	mem, %xmm0

However with -mavx, gcc emits:

	vxorps	%xmm0, %xmm0, %xmm0
	vmovlps	mem, %xmm1, %xmm1
	vshufps	$0xe4, %xmm0, %xmm1, %xmm0

Shouldn't this rather have been something like

	vxorps	%xmm0, %xmm0, %xmm0
	vmovlps	mem, %xmm0, %xmm0

???
Comment 2 H.J. Lu 2012-06-24 14:54:44 UTC
GCC 4.6 doesn't have this problem:

[hjl@gnu-6 pr53759]$ cat x.i
typedef float v4sf __attribute__ ((vector_size (4*4)));
typedef float v2sf __attribute__ ((vector_size (4*2)));

v2sf mem[1];

int main()
{
  v4sf reg = (v4sf){0,0,0,0};
  reg = __builtin_ia32_loadlps(reg, mem);
  return reg[0];
}
[hjl@gnu-6 pr53759]$ gcc -S -mavx -O x.i
[hjl@gnu-6 pr53759]$ cat x.s
	.file	"x.i"
	.text
	.globl	main
	.type	main, @function
main:
.LFB0:
	.cfi_startproc
	vxorps	%xmm0, %xmm0, %xmm0
	vmovlps	mem(%rip), %xmm0, %xmm0
	vcvttss2si	%xmm0, %eax
	ret
	.cfi_endproc
.LFE0:
	.size	main, .-main
	.comm	mem,8,8
	.ident	"GCC: (GNU) 4.6.3 20120306 (Red Hat 4.6.3-2)"
	.section	.note.GNU-stack,"",@progbits
[hjl@gnu-6 pr53759]$
Comment 3 H.J. Lu 2012-06-24 15:46:33 UTC
It is caused by revision 172123:

http://gcc.gnu.org/ml/gcc-cvs/2011-04/msg00316.html
Comment 4 Jakub Jelinek 2012-06-25 08:48:50 UTC
Created attachment 27699 [details]
gcc48-pr53759.patch

Sounds like an obvious typo in that change, the x, x, x alternative is already earlier and shouldn't use vmovlps insn, so that obviously should have been x, m, x.
Comment 5 Jakub Jelinek 2012-06-25 14:53:04 UTC
Author: jakub
Date: Mon Jun 25 14:52:59 2012
New Revision: 188937

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=188937
Log:
	PR target/53759
	* config/i386/sse.md (sse_loadlps): Use x m x constraints instead
	of x x x in the vmovlps load alternative.

	* gcc.target/i386/pr53759.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/i386/pr53759.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/sse.md
    trunk/gcc/testsuite/ChangeLog
Comment 6 Jakub Jelinek 2012-06-25 14:56:22 UTC
Author: jakub
Date: Mon Jun 25 14:56:17 2012
New Revision: 188938

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=188938
Log:
	PR target/53759
	* config/i386/sse.md (sse_loadlps): Use x m x constraints instead
	of x x x in the vmovlps load alternative.

	* gcc.target/i386/pr53759.c: New test.

Added:
    branches/gcc-4_7-branch/gcc/testsuite/gcc.target/i386/pr53759.c
Modified:
    branches/gcc-4_7-branch/gcc/ChangeLog
    branches/gcc-4_7-branch/gcc/config/i386/sse.md
    branches/gcc-4_7-branch/gcc/testsuite/ChangeLog
Comment 7 Jakub Jelinek 2012-06-25 15:05:09 UTC
Should be fixed now, thanks.