This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: SSE (Pentium 3) - Is this correct?

"mal content" <> writes:

> Apologies if this is the wrong list.

It's the wrong list.  This should go to  Please
send any followups there.  Thanks.

> float *vector_add4f(float va[4], const float vb[4])
> {
>   va[0] += vb[0];
>   va[1] += vb[1];
>   va[2] += vb[2];
>   va[3] += vb[3];
>   return va;
> }

> Using -march=pentium3 -mtune=pentium3m -mfpmath=sse, the following
> is generated:

It looks like you didn't use -O.  But even if you do, you won't get a
vector add instruction.

It is possible that this function will be called with overlapping
pointers.  In particular, it is possible that the assignment to va[0]
changes vb[1].  Therefore, this code can not be vectorized.

Even if you fix that, gcc will only vectorize if you pass the
-ftree-vectorize option.  And it will only vectorize code in loops.
And it unfortunately doesn't do a good job of using movups, so it will
mess around with checking the alignment.  And there isn't a good way
to specify alignment.

I do see use of the vector instructions for this example

float *vector_add4f(float * __restrict va, float * __restrict vb)
  int i;

  for (i = 0; i < 4; ++i)
    va[i] += vb[i];
  return va;

if I compile with -O2 -ftree-vectorize.  Frankly the generated code is
really awful, and I wouldn't be surprised if it runs more slowly than
the non-vectorized code.  This is evidently an area where the compiler
could use more work.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]