This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: SSE (Pentium 3) - Is this correct?
"mal content" <artifact.one@googlemail.com> writes:
> Apologies if this is the wrong list.
It's the wrong list. This should go to gcc-help@gcc.gnu.org. Please
send any followups there. Thanks.
> float *vector_add4f(float va[4], const float vb[4])
> {
> va[0] += vb[0];
> va[1] += vb[1];
> va[2] += vb[2];
> va[3] += vb[3];
> return va;
> }
> Using -march=pentium3 -mtune=pentium3m -mfpmath=sse, the following
> is generated:
It looks like you didn't use -O. But even if you do, you won't get a
vector add instruction.
It is possible that this function will be called with overlapping
pointers. In particular, it is possible that the assignment to va[0]
changes vb[1]. Therefore, this code can not be vectorized.
Even if you fix that, gcc will only vectorize if you pass the
-ftree-vectorize option. And it will only vectorize code in loops.
And it unfortunately doesn't do a good job of using movups, so it will
mess around with checking the alignment. And there isn't a good way
to specify alignment.
I do see use of the vector instructions for this example
float *vector_add4f(float * __restrict va, float * __restrict vb)
{
int i;
for (i = 0; i < 4; ++i)
va[i] += vb[i];
return va;
}
if I compile with -O2 -ftree-vectorize. Frankly the generated code is
really awful, and I wouldn't be surprised if it runs more slowly than
the non-vectorized code. This is evidently an area where the compiler
could use more work.
Ian