On Sun, May 12, 2013 at 02:14:31PM +0200, David Brown wrote:
On 11/05/13 17:20, jacob navia wrote:
Le 11/05/13 16:01, OndÅej BÃlka a Ãcrit :
As 1) only way is measure that. Compile following an we will see who is
rigth.
cat "
#include <math.h>
int main(){ int i;
double x=0;
double ret=0;
double f;
for(i=0;i<10000000;i++){
ret+=sin(x);
x+=0.3;
}
return ret;
}
" > sin.c
OK I did a similar thing. I just compiled sin(argc) in main.
The results prove that you were right. The single fsin instruction
takes longer than several HUNDRED instructions (calls, jumps
table lookup what have you)
Gone are the times when an fsin would take 30 cycles or so.
Intel has destroyed the FPU.
What makes you so sure that it takes more than 30 cycles to execute
hundreds of instructions in the library? Modern cpus often do
several instructions per cycle (I am not considering multiple cores
here). They can issue several instructions per cycle, and predicted
jumps can often be eliminated entirely in the decode stages.
To clarify numbers here 30 cycles library call is unrealistic, just
latency caused by call and saving/restoring xmm register overhead
is often more than 30 cycles.
A sin takes around 150 cycles for normal inputs.
A fsin is slower for several reasons. One is that performance depends on
input. From http://www.agner.org/optimize/instruction_tables.pdf