This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug rtl-optimization/32084] gfortran 4.3 13%-18% slower for induct.f90 than gcc 4.0-based competitor

From: "ubizjak at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: 28 Jun 2007 08:36:10 -0000
Subject: [Bug rtl-optimization/32084] gfortran 4.3 13%-18% slower for induct.f90 than gcc 4.0-based competitor
References: <bug-32084-13404@http.gcc.gnu.org/bugzilla/>
Reply-to: gcc-bugzilla at gcc dot gnu dot org


------- Comment #9 from ubizjak at gmail dot com  2007-06-28 08:36 -------
(In reply to comment #7)
> This is what I get without -ftree-vectorize, with -ftree-vectorize (default
> cost model off) and with -ftree-vectorize -fvect-cost-model respectively on an
> AMD x86-64 (with trunk plus the patch posted by Dorit at
> http://gcc.gnu.org/ml/gcc-patches/2007-06/txt00156.txt )
> 
> Case 1: (no vectorization)
> gfortran -static -march=opteron -msse3 -O3 -ffast-math -funroll-loops
> pr32084.f90 -o 4.3.novect.out
> time ./4.3.novect.out
> real    0m4.414s
> user    0m4.312s
> sys     0m0.000s
> 
> Case 2: (vectorization without cost model)
> gfortran -static -ftree-vectorize -march=opteron -msse3 -O3 -ffast-math
> -funroll-loops -fdump-tree-vect-details -fno-show-column pr32084.f90 -o
> 4.3.nocost.out
> time ./4.3.nocost.out
> real    0m4.776s
> user    0m4.668s
> sys     0m0.004s
>
> In short, the 8% advantage that the scalar version has over the vector version
> disappears with the cost model.
> 
> Unless I am missing something, the inner loops at lines 207 and 319 (do k = 1,
> 9) don?t get vectorized (irrespective of the cost model).

No, it is OK (but for core2 and nocona -ftree-vectorize has 50% disadvantage
compared to scalar versions). The problem is that vectorized loop is not
unrolled anymore in the RTL unroller. My speculation is, that by unrolling the
vectorized loop, the runtimes of vectorized version will be _faster_ than
scalar versions.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32084

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]