This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: GCC Benchmarks (coybench), AMD64 and i686, 14 August 2004


BTW, here are the timings for the respective versions:
The first uses SSE math + an SSE sincos
[dberlin@dberlin dberlin]$ icc mole.c -O3 -xN  -ipo
IPO: using IR for /tmp/iccQqtM99.o
IPO: performing single-file optimizations
mole.c(132) : (col. 5) remark: LOOP WAS VECTORIZED.
[dberlin@dberlin dberlin]$ time ./a.out

real    0m6.010s
user    0m5.950s
sys     0m0.020s


making it stop using the SSE sincos, and instead, call out to libm sin + cos gives us:


[dberlin@dberlin dberlin]$ icc mole.c -O3 -xN  -ipo -nolib_inline
IPO: using IR for /tmp/iccb0fk4w.o
IPO: performing single-file optimizations
mole.c(132) : (col. 5) remark: LOOP WAS VECTORIZED.
[dberlin@dberlin dberlin]$ time ./a.out

real    0m20.016s
user    0m19.930s
sys     0m0.020s

Telling it not to use the vectorized math library, but still optimizing for P4 gives us (this is the version that uses x87 fpu only):

[dberlin@dberlin dberlin]$ icc mole.c -O3   -ipo
IPO: using IR for /tmp/icckW7Va1.o
IPO: performing single-file optimizations
[dberlin@dberlin dberlin]$ time ./a.out

real    0m11.084s
user    0m10.210s
sys     0m0.040s
[dberlin@dberlin dberlin]$


So the vectorized intrinsic is worth half it's performance, at least.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]