This is on an AMD Athlon(tm) 64 X2 Dual Core Processor 4800+ (using openSUSE Factory in x86-64 mode). When compiling the Polyhedron "induct.f90" test case with and without vectorization, the run time with vectorization is 30% longer. I think the vectorization cost model needs to be tuned for this processor. (By comparison, with a Core2Duo, the run time doubles without vectorization.) gfortran -march=native -ffast-math -O3 -ftree-vectorize -fvect-cost-model induct.f90 user 0m35.626s gfortran -march=opteron -ffast-math -funroll-loops -ftree-vectorize -ftree-loop-linear -msse3 -O3 induct.f90; time ./a.out real 0m36.676s, user 0m36.390s gfortran -march=opteron -ffast-math -funroll-loops -fno-tree-vectorize -ftree-loop-linear -msse3 -O3 induct.f90; time ./a.out real 0m28.000s, user 0m27.830s (If you don't have the benchmark, it is available from http://www.polyhedron.co.uk/MFL6VW74649 ) The problem was detected when applying the patch http://gcc.gnu.org/ml/fortran/2009-08/msg00208.html. With that patch one has induct.f90:5062: note: LOOP VECTORIZED. induct.f90:5061: note: LOOP VECTORIZED. induct.f90:5060: note: LOOP VECTORIZED. induct.f90:5059: note: LOOP VECTORIZED. induct.f90:5058: note: LOOP VECTORIZED. induct.f90:5057: note: LOOP VECTORIZED. induct.f90:4893: note: LOOP VECTORIZED. and without the patch (and 30% slower): induct.f90:1772: note: LOOP VECTORIZED. induct.f90:1660: note: LOOP VECTORIZED. induct.f90:2220: note: LOOP VECTORIZED. induct.f90:2077: note: LOOP VECTORIZED. induct.f90:3060: note: LOOP VECTORIZED. induct.f90:2918: note: LOOP VECTORIZED. induct.f90:2724: note: LOOP VECTORIZED. induct.f90:2582: note: LOOP VECTORIZED. induct.f90:5062: note: LOOP VECTORIZED. induct.f90:5061: note: LOOP VECTORIZED. induct.f90:5060: note: LOOP VECTORIZED. induct.f90:5059: note: LOOP VECTORIZED. induct.f90:5058: note: LOOP VECTORIZED. induct.f90:5057: note: LOOP VECTORIZED. induct.f90:4893: note: LOOP VECTORIZED.
Link to vectorizer missed-optimization meta-bug.
Adding CC.
It would be nice to see where we are today with respect to the cost model / vectorizing / not vectorizing.
(In reply to comment #3) > It would be nice to see where we are today with respect to the cost model / > vectorizing / not vectorizing. Answer: It became much worse (compared to GCC 4.5 of comment 0): Using gcc version 4.8.0 20130308 [trunk revision 196547], the induct runtimes are: gfortran -march=native -ffast-math -O3 -ftree-vectorize -fvect-cost-model induct.f90 real 0m47.142s / user 0m47.072s / sys 0m0.020s gfortran-4.8 -march=native -ffast-math -O3 -ftree-vectorize -fno-vect-cost-model induct.f90 real 0m35.713s / user 0m35.236s / sys 0m0.052s time gfortran-4.8 -march=native -ffast-math -O3 -fno-tree-vectorize induct.f90 real 0m47.837s / user 0m47.388s / sys 0m0.028s real 0m47.514s / user 0m47.428s / sys 0m0.044s gfortran -march=opteron -ffast-math -funroll-loops -fno-tree-vectorize -ftree-loop-linear -msse3 -O3 induct.f90 real 0m44.676s / user 0m44.640s / sys 0m0.032s gfortran-4.5 -march=opteron -ffast-math -funroll-loops -fno-tree-vectorize -ftree-loop-linear -msse3 -O3 induct.f90; time ./a.out real 0m34.591s / user 0m34.524s / sys 0m0.020s