This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug rtl-optimization/32084] gfortran 4.3 13%-18% slower for induct.f90 than gcc 4.0-based competitor
- From: "ubizjak at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 28 Jun 2007 08:36:10 -0000
- Subject: [Bug rtl-optimization/32084] gfortran 4.3 13%-18% slower for induct.f90 than gcc 4.0-based competitor
- References: <bug-32084-13404@http.gcc.gnu.org/bugzilla/>
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
------- Comment #9 from ubizjak at gmail dot com 2007-06-28 08:36 -------
(In reply to comment #7)
> This is what I get without -ftree-vectorize, with -ftree-vectorize (default
> cost model off) and with -ftree-vectorize -fvect-cost-model respectively on an
> AMD x86-64 (with trunk plus the patch posted by Dorit at
> http://gcc.gnu.org/ml/gcc-patches/2007-06/txt00156.txt )
>
> Case 1: (no vectorization)
> gfortran -static -march=opteron -msse3 -O3 -ffast-math -funroll-loops
> pr32084.f90 -o 4.3.novect.out
> time ./4.3.novect.out
> real 0m4.414s
> user 0m4.312s
> sys 0m0.000s
>
> Case 2: (vectorization without cost model)
> gfortran -static -ftree-vectorize -march=opteron -msse3 -O3 -ffast-math
> -funroll-loops -fdump-tree-vect-details -fno-show-column pr32084.f90 -o
> 4.3.nocost.out
> time ./4.3.nocost.out
> real 0m4.776s
> user 0m4.668s
> sys 0m0.004s
>
> In short, the 8% advantage that the scalar version has over the vector version
> disappears with the cost model.
>
> Unless I am missing something, the inner loops at lines 207 and 319 (do k = 1,
> 9) don?t get vectorized (irrespective of the cost model).
No, it is OK (but for core2 and nocona -ftree-vectorize has 50% disadvantage
compared to scalar versions). The problem is that vectorized loop is not
unrolled anymore in the RTL unroller. My speculation is, that by unrolling the
vectorized loop, the runtimes of vectorized version will be _faster_ than
scalar versions.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32084