User account creation filtered due to spam.

Bug 24073 - (vector float){a, b, 0, 0} code gen is not good
Summary: (vector float){a, b, 0, 0} code gen is not good
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.1.0
: P2 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
Keywords: missed-optimization, ssemmx
Depends on: 22076
  Show dependency treegraph
Reported: 2005-09-27 04:06 UTC by Andrew Pinski
Modified: 2016-08-29 07:19 UTC (History)
1 user (show)

See Also:
Target: i786-*-*
Known to work:
Known to fail:
Last reconfirmed: 2007-07-01 00:41:10


Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Pinski 2005-09-27 04:06:07 UTC
Take the following example:
#define vector __attribute__((vector_size(16)))

float a; float b;
vector float f(void) { return (vector float){ a, b, 0.0, 0.0}; }
Currently we get:
        subl    $12, %esp
        movss   _b, %xmm0
        movss   _a, %xmm1
        unpcklps        %xmm0, %xmm1
        movaps  %xmm1, %xmm0
        xorl    %eax, %eax
        xorl    %edx, %edx
        movl    %eax, (%esp)
        movl    %edx, 4(%esp)
        xorps   %xmm1, %xmm1
        movlhps %xmm1, %xmm0
        addl    $12, %esp

We should be able to produce:
movss _b, %xmm0
movss _a, %xmm1
shufps 60, /*[0, 3, 3, 0]*/, %xmm1, %xmm0 // _a, 0, 0, _b
shufps 201, /*[3, 0, 2, 1]*/, %xmm0, %xmm0 // _a, _b, 0, 0

This is from Nathan Begeman.
Comment 1 Andrew Pinski 2005-09-27 05:07:57 UTC
The issue is in ix86_expand_vector_init.
Comment 2 Serge Belyshev 2005-09-27 05:51:19 UTC
Comment 3 Uroš Bizjak 2005-09-27 11:19:17 UTC
With a couple of months old mainline gcc (20050716), following asm is produced:
(-O2 -msse2 -fomit-frame-pointer):

	subl	$12, %esp
	movss	b, %xmm0
	movss	a, %xmm1
	unpcklps	%xmm0, %xmm1
	movaps	%xmm1, %xmm0
	xorl	%eax, %eax
	xorl	%edx, %edx
	movl	%eax, (%esp)
	movl	%edx, 4(%esp)
>>>	movlps	(%esp), %xmm1
	addl	$12, %esp
	movlhps	%xmm1, %xmm0

This explains where all those xor and moves come from. It looks that newer 
compilers somehow fix the damage by using xorps, a bit late in the game, IMO.

This part of bug depends on PR target/22076.

Other than that, the problem is that V4SF vector initialization is decomposed 
to two V2SF initializations (these are MMX insns and this further confuses 
x87/MMX switching patch) that are later concated to V4SF.
Comment 4 Uroš Bizjak 2005-09-27 11:41:31 UTC
I think that following example wins the contest:

vector float f(void) { return (vector float){ a, a, b, b}; }

gcc -O2 -msse -fomit-frame-pointer

	subl	$28, %esp
	movss	a, %xmm0
	movss	%xmm0, 4(%esp)
	movss	b, %xmm0
	movd	4(%esp), %mm0
	punpckldq	%mm0, %mm0
	movss	%xmm0, 4(%esp)
	movq	%mm0, 16(%esp)
	movd	4(%esp), %mm0
	punpckldq	%mm0, %mm0
	movq	%mm0, 8(%esp)
	movlps	16(%esp), %xmm1
	movhps	8(%esp), %xmm1
	addl	$28, %esp
	movaps	%xmm1, %xmm0

Note the usage of MMX registers.
Comment 5 Andrew Pinski 2005-09-27 14:33:27 UTC
(In reply to comment #4)
> I think that following example wins the contest:
> vector float f(void) { return (vector float){ a, a, b, b}; }

For this, it is a different bug.  The issue with the above is that ix86_expand_vector_init_duplicate check 
for mmx_okay is bad.
Currently, we have
      if (!mmx_ok && !TARGET_SSE)
but I if I change it to:
      if (!mmx_ok)
we get:
        movss   _a, %xmm0
        movss   _b, %xmm1
        unpcklps        %xmm0, %xmm0
        unpcklps        %xmm1, %xmm1
        movlhps %xmm1, %xmm0
Which looks ok to me.  That testcase should be opened into another bug as it is obviously wrong.
Comment 6 Stuart Hastings 2006-08-23 21:24:12 UTC
Cloned 28825 from this bug to track the MMX instruction issue.
Comment 7 Stuart Hastings 2006-08-23 21:54:50 UTC
Time has passed, and GCC has improved on this testcase.  Here is what we generate today (trunk, 23aug2006) for the original testcase:

        movss   b(%rip), %xmm0
        movss   a(%rip), %xmm1
        unpcklps        %xmm0, %xmm1
        movaps  %xmm1, %xmm0
        xorps   %xmm1, %xmm1
        movlhps %xmm1, %xmm0

This isn't perfect, but it's much better than before.