This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/47657] missed vectorization
- From: "Joost.VandeVondele at pci dot uzh.ch" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Wed, 9 Feb 2011 11:25:48 +0000
- Subject: [Bug tree-optimization/47657] missed vectorization
- Auto-submitted: auto-generated
- References: <bug-47657-4@http.gcc.gnu.org/bugzilla/>
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47657
--- Comment #2 from Joost VandeVondele <Joost.VandeVondele at pci dot uzh.ch> 2011-02-09 11:25:42 UTC ---
Created attachment 23283
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=23283
testcase including timing routine, last number is flop rate.
the cray compiler is supposed to *not* interchange loops, as I'm using:
ftn -O3,ipa0,nointerchange,vector3 testcase.f90
to compile. This gives about 5.6Gflops.
Unrolling still seems to happen (there are 16 mults in the inner loop), and
ftn -O3,ipa0,nointerchange,vector3,unroll0 testcase.f90 yields poor
performance (2.3Gflops).
Gfortran 4.5 yields 3.424Gflops :
gfortran -O3 -ffast-math -funroll-loops -march=native testcase.f90