This is the mail archive of the
gcc-help@gcc.gnu.org
mailing list for the GCC project.
Poor pow() / floating point performance of on x86_64
- From: Ralf LÃbben <ralfluebben at gmx dot de>
- To: gcc-help at gcc dot gnu dot org
- Date: Wed, 26 Sep 2007 10:35:20 +0200
- Subject: Poor pow() / floating point performance of on x86_64
Hello,
in the last days I ran a simulation on a x86_64 architecture:
###################
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 6
model name : Genuine Intel(R) CPU 3.20GHz
stepping : 8
cpu MHz : 3192.081
cache size : 8192 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
fpu : yes
fpu_exception : yes
cpuid level : 6
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm
constant_tsc pni monitor ds_cpl vmx est tm2 cid cx16 xtpr lahf_lm
bogomips : 6390.34
clflush size : 64
cache_alignment : 128
address sizes : 40 bits physical, 48 bits virtual
power management:
#####################
with very poor performance.
I ran the same simulations on my notebook:
######################
processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 8
model name : mobile AMD Athlon(tm) XP 2000+
stepping : 1
cpu MHz : 797.820
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat
pse36 mmx fxsr sse syscall mp mmxext 3dnowext 3dnow ts fid vid
bogomips : 1596.37
clflush size : 32
#######################
The same simulation is about 10 times faster on my notebook.
The simulation was compiled with "-O3 -ffast-math", without "-ffast-math" the
performance of the x86_64 architecture is much worse.
I used gcc 4.1.2 on Ubuntu, the simulator is Omnet++.
There was already a post about the topic:
http://gcc.gnu.org/ml/gcc-help/2006-05/msg00185.html
on AMD machines.
I could also figure out, that one problem ist the pow() function, maybe there
are more functions with poor performance on x86_64 machines.
Has anyone an idea about the reasons or how to improve the performance on
x86_64 machines?
Thanks.
Regards,
Ralf