[Bug target/18766] New: Inefficient code with -mfpmath=387,sse

Wed Dec 1 20:54:00 GMT 2004

This is spinoff #1 of PR 17619: 

Take this simple piece of code: 
--------------------- 
float a[2],b[2];  

float foobar () {  
  return a[0] * b[0] 
    + a[1] * b[1];  
}  
--------------------- 

Compiled with  
  -O3 -funroll-loops -msse3 -mtune=pentium4 -march=pentium4 -mfpmath=387 
we get this code: 
--------------------- 
	pushl	%ebp 
	movl	%esp, %ebp 
	flds	b 
	fmuls	a 
	flds	b+4 
	fmuls	a+4 
	faddp	%st, %st(1) 
	popl	%ebp 
	ret 
----------------------------- 
That's certainly optimal. 

On the other hand, if we let the compiler use sse registers as well (though 
we do not force it, we simply want the most efficient code), the code 
we get with flags 
  -O3 -funroll-loops -msse3 -mtune=pentium4 -march=pentium4 -mfpmath=387,sse 
looks like this: 
----------------------------- 
	pushl	%ebp 
	movl	%esp, %ebp 
	subl	$4, %esp 
	flds	b 
	fmuls	a 
	movss	b+4, %xmm0 
	mulss	a+4, %xmm0 
	movss	%xmm0, -4(%ebp) 
	flds	-4(%ebp) 
	faddp	%st, %st(1) 
	leave 
	ret 
--------------------------- 
The code is almost equivalent except for the fact that we have one 
stack push and pop more to satisfy the system ABI that return values 
are passed through st(0). 

In essence, the compiler should just generate the first code sequence, 
even if given the flag -mfpmath=387,sse. 

W.

-- 
           Summary: Inefficient code with -mfpmath=387,sse
           Product: gcc
           Version: 4.0.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: target
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: bangerth at dealii dot org
                CC: gcc-bugs at gcc dot gnu dot org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18766