Tue Dec 20 10:43:00 GMT 2011

* On Tue Dec 20 11:34:35 +0100 2011, Jonathan Wakely wrote:
> > I have been reducing the program to see what the smallest code is that still
> > shows this behaviour. Latest version is below.
> >
> > $ gcc -msse -mfpmath=sse -O3 -march=native test.c
> What is "native" for your system, i686? (also, what does gcc -dumpmachine show?)


> i686 doesn't support SSE, you need at least pentium3.
> Remove the -msse and see if you get a warning telling you SSE
> instructions are disabled.


> Try -march=pentium3 -mfpmath=sse instead (without -msse)
> If you don't have at least a pentium3, you're stuck with the 387 FP
> registers, and have to use horrible code.

> That looks as though you're still not using SSE registers.

The inner loop boils down to this (-msse -mfpmath=sse -O3 -march=native)

 8048370:       66 0f 28 c1             movapd %xmm1,%xmm0
 8048374:       83 e8 01                sub    $0x1,%eax
 8048377:       f2 0f 59 c2             mulsd  %xmm2,%xmm0
 804837b:       66 0f 28 c8             movapd %xmm0,%xmm1
 804837f:       f2 0f 59 ca             mulsd  %xmm2,%xmm1
 8048383:       75 eb                   jne    8048370 <main+0x40>

or this (-march=pentium3 -mfpmath=sse -O3)

 8048360:       dd d9                   fstp   %st(1)
 8048362:       83 e8 01                sub    $0x1,%eax
 8048365:       d8 c9                   fmul   %st(1),%st
 8048367:       d9 c0                   fld    %st(0)
 8048369:       d8 ca                   fmul   %st(2),%st
 804836b:       75 f3                   jne    8048360 <main+0x30

The first runs about twice as fast as the latter, but still I see a huge difference
in run time depending on the 'f' in the original code


