This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Improvement of vectorization on loops generated by Graphite
- From: Sebastian Pop <sebpop at gmail dot com>
- To: gcc-graphite <gcc-graphite at googlegroups dot com>
- Cc: gcc at gcc dot gnu dot org
- Date: Tue, 27 Jul 2010 18:47:53 -0500
- Subject: Improvement of vectorization on loops generated by Graphite
Hi,
I ran the following script to gather data with trunk (from 20100615)
and Graphite branch (today).
for i in `ls -1 *.f90`; do
echo -n $i
$FC $OPT -c ./$i &> out
grep "LOOP VECTORIZED" out | wc
done
The following columns correspond to the number of lines reported by wc.
Trunk0: OPT="-ftree-vectorizer-verbose=2 -O3 -ffast-math"
Trunk1: OPT="-ftree-vectorizer-verbose=2 -O3 -ffast-math -fgraphite-identity"
Gr0: OPT="-ftree-vectorizer-verbose=2 -O3 -ffast-math"
Gr1: OPT="-ftree-vectorizer-verbose=2 -O3 -ffast-math
-fgraphite-identity -fno-loop-strip-mine -fno-loop-interchange
-fno-loop-block"
Trunk0 Trunk1 Gr0 Gr1
ac.f90 30 30 29 29
aermod.f90 151 110 147 147
air.f90 4 3 4 4
capacita.f90 17 11 13 13
channel.f90 15 14 14 14
doduc.f90 155 146 155 155
fatigue.f90 15 15 15 15
gas_dyn.f90 44 42 41 41
induct.f90 9 5 5 5
linpk.f90 14 3 14 14
mdbx.f90 12 8 12 12
nf.f90 51 34 50 50
protein.f90 31 31 31 31
rnflow.f90 87 75 85 85
test_fpu.f90 80 65 78 78
tfft.f90 4 3 4 4
Overall, with the recent changes that I pushed to the Graphite branch
and that should be stable by now, we improved the vectorization of
loops generated by Graphite.
The improvements in today's Graphite branch Gr1 with respect to
Trunk1, that is trunk with -fgraphite-identity are the difference
between Gr1 and Trunk1 (higher is more loops vectorized by Gr1):
ac.f90 -1
aermod.f90 37
air.f90 1
capacita.f90 2
channel.f90 0
doduc.f90 9
fatigue.f90 0
gas_dyn.f90 -1
induct.f90 0
linpk.f90 11
mdbx.f90 4
nf.f90 16
protein.f90 0
rnflow.f90 10
test_fpu.f90 13
tfft.f90 1
There still are some missed vectorization cases, see the difference
between Trunk0 and Gr0:
ac.f90 1
aermod.f90 4
air.f90 0
capacita.f90 4
channel.f90 1
doduc.f90 0
fatigue.f90 0
gas_dyn.f90 3
induct.f90 4
linpk.f90 0
mdbx.f90 0
nf.f90 1
protein.f90 0
rnflow.f90 2
test_fpu.f90 2
tfft.f90 0
After these changes are merged to trunk, we should revisit the
following PRs:
http://gcc.gnu.org/PR38846: 35% slower using -floop* than without graphite
http://gcc.gnu.org/PR40979: induct benchmark 60% slower when compiled
with -fgraphite
http://gcc.gnu.org/PR43359: gas_dyn benchmark exhibits missed
vectorization with graphite
Sebastian Pop
--
AMD / Open Source Compiler Engineering / GNU Tools