This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: GCC Benchmarks (coybench), AMD64 and i686, 14 August 2004
BTW, here are the timings for the respective versions:
The first uses SSE math + an SSE sincos
[dberlin@dberlin dberlin]$ icc mole.c -O3 -xN -ipo
IPO: using IR for /tmp/iccQqtM99.o
IPO: performing single-file optimizations
mole.c(132) : (col. 5) remark: LOOP WAS VECTORIZED.
[dberlin@dberlin dberlin]$ time ./a.out
real 0m6.010s
user 0m5.950s
sys 0m0.020s
making it stop using the SSE sincos, and instead, call out to libm sin
+ cos gives us:
[dberlin@dberlin dberlin]$ icc mole.c -O3 -xN -ipo -nolib_inline
IPO: using IR for /tmp/iccb0fk4w.o
IPO: performing single-file optimizations
mole.c(132) : (col. 5) remark: LOOP WAS VECTORIZED.
[dberlin@dberlin dberlin]$ time ./a.out
real 0m20.016s
user 0m19.930s
sys 0m0.020s
Telling it not to use the vectorized math library, but still optimizing
for P4 gives us (this is the version that uses x87 fpu only):
[dberlin@dberlin dberlin]$ icc mole.c -O3 -ipo
IPO: using IR for /tmp/icckW7Va1.o
IPO: performing single-file optimizations
[dberlin@dberlin dberlin]$ time ./a.out
real 0m11.084s
user 0m10.210s
sys 0m0.040s
[dberlin@dberlin dberlin]$
So the vectorized intrinsic is worth half it's performance, at least.