This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/17619] Non-optimal code for -mfpmath=387,sse
- From: "uros at gcc dot gnu dot org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 1 Dec 2004 16:02:40 -0000
- Subject: [Bug target/17619] Non-optimal code for -mfpmath=387,sse
- References: <20040922191917.17619.bangerth@dealii.org>
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
------- Additional Comments From uros at gcc dot gnu dot org 2004-12-01 16:02 -------
If the loop is splitted manually and putting a, b and c inside the foobar()
function [otherwise vectorizer complains about unaligned load]:
--cut here--
struct X
{
float array[4];
};
float foobar()
{
X a, b, c;
float s = 0;
for (unsigned int d = 0; d < 4; ++d)
c.array[d] = a.array[d] * b.array[d];
for (unsigned int d = 0; d < 4; ++d)
s += c.array[d];
return s;
}
--cut here--
Compiling this example with rigth pack of options: -O2 -march=pentium4
-ftree-vectorize -mfpmath=sse,387 -funroll-loops -fomit-frame-pointer
-ffast-math, this wonderful piece of code is produced:
_Z6foobarv:
.LFB2:
subl $60, %esp
.LCFI0:
movaps 32(%esp), %xmm0
mulps 16(%esp), %xmm0
movaps %xmm0, (%esp)
flds 4(%esp)
fadds (%esp)
fadds 8(%esp)
fadds 12(%esp)
addl $60, %esp
ret
I don't know why vectorized doesn't like original testcase.
Uros.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17619