Improvement of vectorization on loops generated by Graphite
Jack Howarth
howarth@bromo.med.uc.edu
Wed Jul 28 02:09:00 GMT 2010
On Tue, Jul 27, 2010 at 06:47:53PM -0500, Sebastian Pop wrote:
> Hi,
>
> I ran the following script to gather data with trunk (from 20100615)
> and Graphite branch (today).
>
> for i in `ls -1 *.f90`; do
> echo -n $i
> $FC $OPT -c ./$i &> out
> grep "LOOP VECTORIZED" out | wc
> done
>
> The following columns correspond to the number of lines reported by wc.
>
> Trunk0: OPT="-ftree-vectorizer-verbose=2 -O3 -ffast-math"
> Trunk1: OPT="-ftree-vectorizer-verbose=2 -O3 -ffast-math -fgraphite-identity"
> Gr0: OPT="-ftree-vectorizer-verbose=2 -O3 -ffast-math"
> Gr1: OPT="-ftree-vectorizer-verbose=2 -O3 -ffast-math
> -fgraphite-identity -fno-loop-strip-mine -fno-loop-interchange
> -fno-loop-block"
>
> Trunk0 Trunk1 Gr0 Gr1
> ac.f90 30 30 29 29
> aermod.f90 151 110 147 147
> air.f90 4 3 4 4
> capacita.f90 17 11 13 13
> channel.f90 15 14 14 14
> doduc.f90 155 146 155 155
> fatigue.f90 15 15 15 15
> gas_dyn.f90 44 42 41 41
> induct.f90 9 5 5 5
> linpk.f90 14 3 14 14
> mdbx.f90 12 8 12 12
> nf.f90 51 34 50 50
> protein.f90 31 31 31 31
> rnflow.f90 87 75 85 85
> test_fpu.f90 80 65 78 78
> tfft.f90 4 3 4 4
>
> Overall, with the recent changes that I pushed to the Graphite branch
> and that should be stable by now, we improved the vectorization of
> loops generated by Graphite.
>
> The improvements in today's Graphite branch Gr1 with respect to
> Trunk1, that is trunk with -fgraphite-identity are the difference
> between Gr1 and Trunk1 (higher is more loops vectorized by Gr1):
>
> ac.f90 -1
> aermod.f90 37
> air.f90 1
> capacita.f90 2
> channel.f90 0
> doduc.f90 9
> fatigue.f90 0
> gas_dyn.f90 -1
> induct.f90 0
> linpk.f90 11
> mdbx.f90 4
> nf.f90 16
> protein.f90 0
> rnflow.f90 10
> test_fpu.f90 13
> tfft.f90 1
>
> There still are some missed vectorization cases, see the difference
> between Trunk0 and Gr0:
>
> ac.f90 1
> aermod.f90 4
> air.f90 0
> capacita.f90 4
> channel.f90 1
> doduc.f90 0
> fatigue.f90 0
> gas_dyn.f90 3
> induct.f90 4
> linpk.f90 0
> mdbx.f90 0
> nf.f90 1
> protein.f90 0
> rnflow.f90 2
> test_fpu.f90 2
> tfft.f90 0
>
Sebastian,
When do you think we may start to see the vectorizations in
Gr1 exceed those from Gr0? Will that required upgrading to the
newer cloog?
Jack
ps If the vectorizations using -fgraphite-identity eventually reaches
parity with those without that option, would -fgraphite-identity
become defaulted on for gcc builds with graphite support
(assuming minimal compile time increases)?
> After these changes are merged to trunk, we should revisit the
> following PRs:
>
> http://gcc.gnu.org/PR38846: 35% slower using -floop* than without graphite
> http://gcc.gnu.org/PR40979: induct benchmark 60% slower when compiled
> with -fgraphite
> http://gcc.gnu.org/PR43359: gas_dyn benchmark exhibits missed
> vectorization with graphite
>
> Sebastian Pop
> --
> AMD / Open Source Compiler Engineering / GNU Tools
More information about the Gcc
mailing list