Floating point performance issue

Jonathan Wakely jwakely.gcc@gmail.com
Tue Dec 20 10:34:00 GMT 2011

On 20 December 2011 10:20, Ico wrote:
> Still, I'm not sure if sse is part of the problem and/or solution.

It's the solution.

> I have been reducing the program to see what the smallest code is that still
> shows this behaviour. Latest version is below.
> $ gcc -msse -mfpmath=sse -O3 -march=native test.c

What is "native" for your system, i686? (also, what does gcc -dumpmachine show?)
i686 doesn't support SSE, you need at least pentium3.

Remove the -msse and see if you get a warning telling you SSE
instructions are disabled.

Try -march=pentium3 -mfpmath=sse instead (without -msse)

If you don't have at least a pentium3, you're stuck with the 387 FP
registers, and have to use horrible code.

> $ time ./a.out 0.9
> real    0m2.653s
> user    0m2.648s
> sys     0m0.002s

That looks as though you're still not using SSE registers.

> $ time ./a.out 0.001
> real    0m0.144s
> user    0m0.140s
> sys     0m0.002s

