This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug target/17619] Non-optimal code for -mfpmath=387,sse

From: "uros at gcc dot gnu dot org" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: 1 Dec 2004 16:02:40 -0000
Subject: [Bug target/17619] Non-optimal code for -mfpmath=387,sse
References: <20040922191917.17619.bangerth@dealii.org>
Reply-to: gcc-bugzilla at gcc dot gnu dot org

------- Additional Comments From uros at gcc dot gnu dot org  2004-12-01 16:02 -------
If the loop is splitted manually and putting a, b and c inside the foobar()
function [otherwise vectorizer complains about unaligned load]:

--cut here--
struct X
{
  float array[4];
};

float foobar()
{
  X a, b, c;

  float s = 0;
  for (unsigned int d = 0; d < 4; ++d)
    c.array[d] = a.array[d] * b.array[d];

  for (unsigned int d = 0; d < 4; ++d)
    s += c.array[d];

  return s;
}
--cut here--

Compiling this example with rigth pack of options: -O2 -march=pentium4
-ftree-vectorize -mfpmath=sse,387 -funroll-loops -fomit-frame-pointer
-ffast-math, this wonderful piece of code is produced:

_Z6foobarv:
.LFB2:
        subl    $60, %esp
.LCFI0:
        movaps  32(%esp), %xmm0
        mulps   16(%esp), %xmm0
        movaps  %xmm0, (%esp)
        flds    4(%esp)
        fadds   (%esp)
        fadds   8(%esp)
        fadds   12(%esp)
        addl    $60, %esp
        ret

I don't know why vectorized doesn't like original testcase.

Uros.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17619

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]