This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Calculating cosinus/sinus
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: David Brown <david dot brown at hesbynett dot no>
- Cc: jacob at jacob dot remcomp dot fr, Robert Dewar <dewar at adacore dot com>, gcc at gcc dot gnu dot org, Marc Glisse <marc dot glisse at inria dot fr>
- Date: Sun, 12 May 2013 15:53:42 +0200
- Subject: Re: Calculating cosinus/sinus
- References: <518E0C29 dot 6050204 at jacob dot remcomp dot fr> <alpine dot DEB dot 2 dot 02 dot 1305111128410 dot 3954 at laptop-mg dot saclay dot inria dot fr> <518E1299 dot 80609 at jacob dot remcomp dot fr> <518E48ED dot 70603 at adacore dot com> <20130511140159 dot GA7481 at domone dot kolej dot mff dot cuni dot cz> <518E61B4 dot 2020401 at jacob dot remcomp dot fr> <518F87A7 dot 7030905 at hesbynett dot no>
On Sun, May 12, 2013 at 02:14:31PM +0200, David Brown wrote:
> On 11/05/13 17:20, jacob navia wrote:
> >Le 11/05/13 16:01, OndÅej BÃlka a Ãcrit :
> >>As 1) only way is measure that. Compile following an we will see who is
> >>rigth.
> >>
> >>cat "
> >>#include <math.h>
> >>
> >>int main(){ int i;
> >> double x=0;
> >>
> >> double ret=0;
> >> double f;
> >> for(i=0;i<10000000;i++){
> >> ret+=sin(x);
> >> x+=0.3;
> >> }
> >> return ret;
> >>}
> >>" > sin.c
> >OK I did a similar thing. I just compiled sin(argc) in main.
> >The results prove that you were right. The single fsin instruction
> >takes longer than several HUNDRED instructions (calls, jumps
> >table lookup what have you)
> >
> >Gone are the times when an fsin would take 30 cycles or so.
> >Intel has destroyed the FPU.
> >
>
> What makes you so sure that it takes more than 30 cycles to execute
> hundreds of instructions in the library? Modern cpus often do
> several instructions per cycle (I am not considering multiple cores
> here). They can issue several instructions per cycle, and predicted
> jumps can often be eliminated entirely in the decode stages.
>
To clarify numbers here 30 cycles library call is unrealistic, just
latency caused by call and saving/restoring xmm register overhead
is often more than 30 cycles.
A sin takes around 150 cycles for normal inputs.
A fsin is slower for several reasons. One is that performance depends on
input. From http://www.agner.org/optimize/instruction_tables.pdf
fsin takes about 20-100 cycles.
Second problem is that xmm->memory->fpu->memory->xmm roundtrip is expensive.
There is performance penalty when switching between fpu and xmm
instructions.
> The moral here is that /you/ need to benchmark /your/ code on /your/
> processor - don't jump to conclusions, or accept other benchmarks as
> giving the complete picture.
>
Agreed.