Floating point performance issue
Ico
gcc@zevv.nl
Tue Dec 20 10:43:00 GMT 2011
* On Tue Dec 20 11:34:35 +0100 2011, Jonathan Wakely wrote:
> > I have been reducing the program to see what the smallest code is that still
> > shows this behaviour. Latest version is below.
> >
> > $ gcc -msse -mfpmath=sse -O3 -march=native test.c
>
> What is "native" for your system, i686? (also, what does gcc -dumpmachine show?)
i486-linux-gnu
> i686 doesn't support SSE, you need at least pentium3.
>
> Remove the -msse and see if you get a warning telling you SSE
> instructions are disabled.
True
> Try -march=pentium3 -mfpmath=sse instead (without -msse)
>
> If you don't have at least a pentium3, you're stuck with the 387 FP
> registers, and have to use horrible code.
> That looks as though you're still not using SSE registers.
The inner loop boils down to this (-msse -mfpmath=sse -O3 -march=native)
8048370: 66 0f 28 c1 movapd %xmm1,%xmm0
8048374: 83 e8 01 sub $0x1,%eax
8048377: f2 0f 59 c2 mulsd %xmm2,%xmm0
804837b: 66 0f 28 c8 movapd %xmm0,%xmm1
804837f: f2 0f 59 ca mulsd %xmm2,%xmm1
8048383: 75 eb jne 8048370 <main+0x40>
or this (-march=pentium3 -mfpmath=sse -O3)
8048360: dd d9 fstp %st(1)
8048362: 83 e8 01 sub $0x1,%eax
8048365: d8 c9 fmul %st(1),%st
8048367: d9 c0 fld %st(0)
8048369: d8 ca fmul %st(2),%st
804836b: 75 f3 jne 8048360 <main+0x30
The first runs about twice as fast as the latter, but still I see a huge difference
in run time depending on the 'f' in the original code
--
:wq
^X^Cy^K^X^C^C^C^C
More information about the Gcc-help
mailing list