This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: GCC Benchmarks (coybench), AMD64 and i686, 14 August 2004
Scott Robert Ladd wrote:
> This problem is unique to either 64-bit or Opteron code generation,
> given that my Pentium 4 does not show the same regression. On the
> other hand, Intel produces code that is 6X faster on Pentium 4,
> suggesting that GCC's code generation may be weak there as well.
I don't know where the difference on Opteron comes from, but for the
difference between GCC 3.4.x and ICC on Pentium4, it becomes quite
clear what optimisations ICC does and are missed by GCC. It boild down
to the single following function:
inline static double dv(double x)
{
return 2.0 * sin(((x < PI2) ? x : PI2)) * cos(((x < PI2) ? x :
PI2));
}
1) This function is poorly written. A much more efficient way to write
exactly the same function would be :
inline static double dv(double x)
{
if (x >= PI2) return 0;
else return sin(2.0 * x);
}
This removes the test out of the function call and uses the
mathematical property that sin(2*x) = 2*sin(x)*cos(x).
On my computer, this transformation alone reduces the run time from 32s
to 18s. My guess is that ICC is able to make this transform itself thus
avoiding having to calculate both sin and cos and then do the
multiplication. It also possible that ICC does not do this transform,
but uses a function to calculate sin and cos at the same time. In both
cases, it results in a single funciton instead of 2.
2) The function dv() is called from within a loop with the argument not
changing. I guess that ICC detects that dv() is a pure function and
moves it out of the loop. GCC however calculates the sin and cos at
each iteration. By moving the call outside the loop and storing the
result in a variable that is used in the loop, the run time goes down
to 7s thus very close to ICC.
So, all in all, GCC just misses 2 optimisations in this case:
- high level transforms of trigonometric functions
- recognize that trigonometric functions are called with the same
arguments in a loop and fail to move them out of the loop
While both these types of optimizations would be nice to have, I still
consider them to be optimizations that someone who writes a program
that does mathematical calculations should have done himself when
writing the program.
--
Marcel Cox (using XanaNews 1.16.3.1)