[Bug target/18766] New: Inefficient code with -mfpmath=387,sse
bangerth at dealii dot org
gcc-bugzilla@gcc.gnu.org
Wed Dec 1 20:54:00 GMT 2004
This is spinoff #1 of PR 17619:
Take this simple piece of code:
---------------------
float a[2],b[2];
float foobar () {
return a[0] * b[0]
+ a[1] * b[1];
}
---------------------
Compiled with
-O3 -funroll-loops -msse3 -mtune=pentium4 -march=pentium4 -mfpmath=387
we get this code:
---------------------
pushl %ebp
movl %esp, %ebp
flds b
fmuls a
flds b+4
fmuls a+4
faddp %st, %st(1)
popl %ebp
ret
-----------------------------
That's certainly optimal.
On the other hand, if we let the compiler use sse registers as well (though
we do not force it, we simply want the most efficient code), the code
we get with flags
-O3 -funroll-loops -msse3 -mtune=pentium4 -march=pentium4 -mfpmath=387,sse
looks like this:
-----------------------------
pushl %ebp
movl %esp, %ebp
subl $4, %esp
flds b
fmuls a
movss b+4, %xmm0
mulss a+4, %xmm0
movss %xmm0, -4(%ebp)
flds -4(%ebp)
faddp %st, %st(1)
leave
ret
---------------------------
The code is almost equivalent except for the fact that we have one
stack push and pop more to satisfy the system ABI that return values
are passed through st(0).
In essence, the compiler should just generate the first code sequence,
even if given the flag -mfpmath=387,sse.
W.
--
Summary: Inefficient code with -mfpmath=387,sse
Product: gcc
Version: 4.0.0
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: bangerth at dealii dot org
CC: gcc-bugs at gcc dot gnu dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18766
More information about the Gcc-bugs
mailing list