This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

GCC viciously beaten by ICC in trig test!


Consider the following program, compiled and run on a Pentium 4 (Northwood) system:

    #include <math.h>
    #include <stdio.h>

    double doit(double a)
        double s = sin(a);
        double c = cos(a);

        // should always be 1
        return s * s + c * c;

    int main(void)
        double a = 1.0, r = 0.0;

        for (int i = 0; i < 100000000; ++i)
            r += doit(a);

        printf("r = %f\n",r);
        return 0;

Using both GCC 3.3.3 and Intel C++ 8.0, I compiled the above with these command lines:

    gcc -o sincosg -lm -std=gnu99 -O3 -march=pentium4 \
           -mfpmath=387 -ffast-math -fomit-frame-pointer sincos.c

icc -o sincosi -lm -O3 -xN -tpp7 -ipo sincos.c

Both programs produce the correct result. The run times, measured with the time command, were:

    0.2s  ICC
   12.8s  GCC (!!!!!)

Very ugly, from the perspective of GCC. I also compiled with recently-acquired-from-CVS builds of 3.4.0 and tree-ssa, for good measure and with no improvement in performance.

This is a killer issue on certain applications that I run, in which trigonometric functions play a crucial role.

Examining the generated assembler source shows that Intel eliminates the function call to "doit()" entirely, replacing it with inline code that calls internal functions such as vmldSin2 and vmldCos2, while it's actual compilation of doit() uses the SSE2 sincos instruction, whereas GCC generates calls to the 387 instructions fsin and fcos.

I have found taht ICC wins any contest in which trigonometric functions play a significant role; even in the most complex code (which can not be inlined), ICC wins by a factor of 3 to 1 in computational speed.

Can GCC generate faster code for trigonometric code? Options that I tried (with no useful effect) include:

    -msse	(implied by -march=pentium4, but tested anyway)
    -msse2	(implied by -march=pentium4, but tested anyway)

I look forward to illumination (or at least some bright ideas!)


Scott Robert Ladd
Coyote Gulch Productions (
Software Invention for High-Performance Computing

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]