This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
GCC viciously beaten by ICC in trig test!
- From: Scott Robert Ladd <coyote at coyotegulch dot com>
- To: gcc at gcc dot gnu dot org
- Date: Sun, 14 Mar 2004 18:39:05 -0500
- Subject: GCC viciously beaten by ICC in trig test!
Hello,
Consider the following program, compiled and run on a Pentium 4
(Northwood) system:
#include <math.h>
#include <stdio.h>
double doit(double a)
{
double s = sin(a);
double c = cos(a);
// should always be 1
return s * s + c * c;
}
int main(void)
{
double a = 1.0, r = 0.0;
for (int i = 0; i < 100000000; ++i)
r += doit(a);
printf("r = %f\n",r);
return 0;
}
Using both GCC 3.3.3 and Intel C++ 8.0, I compiled the above with these
command lines:
gcc -o sincosg -lm -std=gnu99 -O3 -march=pentium4 \
-mfpmath=387 -ffast-math -fomit-frame-pointer sincos.c
icc -o sincosi -lm -O3 -xN -tpp7 -ipo sincos.c
Both programs produce the correct result. The run times, measured with
the time command, were:
0.2s ICC
12.8s GCC (!!!!!)
Very ugly, from the perspective of GCC. I also compiled with
recently-acquired-from-CVS builds of 3.4.0 and tree-ssa, for good
measure and with no improvement in performance.
This is a killer issue on certain applications that I run, in which
trigonometric functions play a crucial role.
Examining the generated assembler source shows that Intel eliminates the
function call to "doit()" entirely, replacing it with inline code that
calls internal functions such as vmldSin2 and vmldCos2, while it's
actual compilation of doit() uses the SSE2 sincos instruction, whereas
GCC generates calls to the 387 instructions fsin and fcos.
I have found taht ICC wins any contest in which trigonometric functions
play a significant role; even in the most complex code (which can not be
inlined), ICC wins by a factor of 3 to 1 in computational speed.
Can GCC generate faster code for trigonometric code? Options that I
tried (with no useful effect) include:
-mfpmath=sse
-mfpmath=387
-mfpmath=387,sse
-msse (implied by -march=pentium4, but tested anyway)
-msse2 (implied by -march=pentium4, but tested anyway)
I look forward to illumination (or at least some bright ideas!)
..Scott
--
Scott Robert Ladd
Coyote Gulch Productions (http://www.coyotegulch.com)
Software Invention for High-Performance Computing