This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Calculating cosinus/sinus


On 5/12/2013 9:53 AM, OndÅej BÃlka wrote:
On Sun, May 12, 2013 at 02:14:31PM +0200, David Brown wrote:
On 11/05/13 17:20, jacob navia wrote:
Le 11/05/13 16:01, OndÅej BÃlka a Ãcrit :
As 1) only way is measure that. Compile following an we will see who is
rigth.

cat "
#include <math.h>

int main(){ int i;
   double x=0;

   double ret=0;
   double f;
   for(i=0;i<10000000;i++){
      ret+=sin(x);
     x+=0.3;
   }
   return ret;
}
" > sin.c
OK I did a similar thing. I just compiled sin(argc) in main.
The results prove that you were right. The single fsin instruction
takes longer than several HUNDRED instructions (calls, jumps
table lookup what have you)

Gone are the times when an fsin would take 30 cycles or so.
Intel has destroyed the FPU.

What makes you so sure that it takes more than 30 cycles to execute
hundreds of instructions in the library?  Modern cpus often do
several instructions per cycle (I am not considering multiple cores
here).  They can issue several instructions per cycle, and predicted
jumps can often be eliminated entirely in the decode stages.

To clarify numbers here 30 cycles library call is unrealistic, just
latency caused by call and saving/restoring xmm register overhead
is often more than 30 cycles.
A sin takes around 150 cycles for normal inputs.

A fsin is slower for several reasons. One is that performance depends on
input. From http://www.agner.org/optimize/instruction_tables.pdf
interesting historical reference
fsin takes about 20-100 cycles.
Those tables show up to 210 cycles for some highly reputed CPU models of various brands. This doesn't count the next issue:

Second problem is that xmm->memory->fpu->memory->xmm roundtrip is expensive.
There is performance penalty when switching between fpu and xmm
instructions.
Which would be a reason for fsin appearing in mathinline.h for i386 but no such for x86_64 implementations of glibc. Yes, it's popular to malign gcc developers or Intel even where it is out of their hands.
The moral here is that /you/ need to benchmark /your/ code on /your/
processor - don't jump to conclusions, or accept other benchmarks as
giving the complete picture.

Agreed.


--
Tim Prince


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]