This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/17619] Non-optimal code for -mfpmath=387,sse


------- Additional Comments From uros at gcc dot gnu dot org  2004-12-01 16:02 -------
If the loop is splitted manually and putting a, b and c inside the foobar()
function [otherwise vectorizer complains about unaligned load]:

--cut here--
struct X
{
  float array[4];
};

float foobar()
{
  X a, b, c;

  float s = 0;
  for (unsigned int d = 0; d < 4; ++d)
    c.array[d] = a.array[d] * b.array[d];

  for (unsigned int d = 0; d < 4; ++d)
    s += c.array[d];

  return s;
}
--cut here--

Compiling this example with rigth pack of options: -O2 -march=pentium4
-ftree-vectorize -mfpmath=sse,387 -funroll-loops -fomit-frame-pointer
-ffast-math, this wonderful piece of code is produced:

_Z6foobarv:
.LFB2:
        subl    $60, %esp
.LCFI0:
        movaps  32(%esp), %xmm0
        mulps   16(%esp), %xmm0
        movaps  %xmm0, (%esp)
        flds    4(%esp)
        fadds   (%esp)
        fadds   8(%esp)
        fadds   12(%esp)
        addl    $60, %esp
        ret

I don't know why vectorized doesn't like original testcase.

Uros.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17619


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]