This is the mail archive of the
mailing list for the GCC project.
Re: Pentium 4 Optimization Complimenting icc 5.0
> I have been benchmarking Dual 1GHz Pentium 3 vs. Dual 1.7GHz Pentium 4
> Xeon systems with
> known UNIX/LINUX benchmark suites such as "ubench" and "unixbenchmark".
> My findings are
> almost unbelievable. The P3 system outperforms the P4 system by 20%.
This is, unfortunately, the common case. P4 is very different and requires very
different optimization strategies. Latest gcc release (3.0) is older than P4
itself, so it is no surprise that the support is missing. Gcc 3.1 (scheduled
for April) already do have some optimizations - the support for SSE arithmetics
developed in AMD x86-64 porting project that has been reported really
succesfull for FP intensive applications, support for branch prediction hints -
I duno how sucesfull w/o the profile feedback and limited support for P4
The last part is especially weak. Tunning for P4 needs lots of experimentation
and I didn't had much time for that while implementing it. In case you want to
get good perfomrance, perhaps it is good idea to download the latest snapshot
and try to tune the knobs of -march=pentium4, so the release will perform well.
I can try to tune the settings in case I will see testcases. Unfortunately
otherwise I really can't do much due to lack of time :(
There has been reported some disapointing results for -march=pentium4 switch
(i.e even slower code than for -march=pentium3). -march=athlon can be
interesting counterpart to try. In some cases the architectures are similar -
they like small code and have partial register dependency stalls.