This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug rtl-optimization/31021] gfortran 20% slower than ifort on CP2K computational kernel
- From: "burnus at gcc dot gnu dot org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 2 Mar 2007 09:39:00 -0000
- Subject: [Bug rtl-optimization/31021] gfortran 20% slower than ifort on CP2K computational kernel
- References: <bug-31021-6642@http.gcc.gnu.org/bugzilla/>
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
------- Comment #3 from burnus at gcc dot gnu dot org 2007-03-02 09:38 -------
On my "AMD Athlon(tm) 64 X2 Dual Core Processor 4800+", gfortran is in x86_64
mode only 13% slower:
gfortran: Kernel time 5.872366, real 0m33.121s; user 0m32.898s; sys 0m0.088s.
Ifort: Kernel time 5.244328, real 0m28.893s, user 0m28.758s, sys 0m0.076s.
Options: "ifort -xP -O3 -xW -free" and "gfortran -O3 -march=native -ffast-math
-ffree-form -ftree-vectorize -funroll-loops".
For grid_fast.F, one difference is which loops are vectorized; ifort vectorizes
the loops in line 44, 469, 483 and 496, gfortran only vectorizes the loops in
line 496 and 469; for the other ones:
grid_fast.F:44: note: not vectorized: complicated access pattern.
DO lz=1,lz_max(lxy)
lxyz=lxyz+1
pyx(1,lxy)=pyx(1,lxy)+pzyx(lxyz)*polz(lxyz,kg)
pyx(2,lxy)=pyx(2,lxy)+pzyx(lxyz)*polz(lxyz,kg2)
ENDDO
grid_fast.F:483: note: not vectorized: can't determine dependence between
(*coef_447)[D.1967_2320] and (*coef_447)[D.1967_2320]
DO icoef=1,coef_max
coef(icoef,1)=coef(icoef,1)+alpha(icoef,lx)*g1
coef(icoef,2)=coef(icoef,2)+alpha(icoef,lx)*g2
coef(icoef,3)=coef(icoef,3)+alpha(icoef,lx)*g1k
coef(icoef,4)=coef(icoef,4)+alpha(icoef,lx)*g2k
ENDDO
--
burnus at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |burnus at gcc dot gnu dot
| |org
Keywords| |missed-optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31021