This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Improvement of vectorization on loops generated by Graphite


Hi,

I ran the following script to gather data with trunk (from 20100615)
and Graphite branch (today).

for i in `ls -1 *.f90`; do
    echo -n $i
    $FC $OPT -c ./$i &> out
    grep "LOOP VECTORIZED" out | wc
done

The following columns correspond to the number of lines reported by wc.

Trunk0: OPT="-ftree-vectorizer-verbose=2 -O3 -ffast-math"
Trunk1: OPT="-ftree-vectorizer-verbose=2 -O3 -ffast-math -fgraphite-identity"
Gr0: OPT="-ftree-vectorizer-verbose=2 -O3 -ffast-math"
Gr1: OPT="-ftree-vectorizer-verbose=2 -O3 -ffast-math
-fgraphite-identity -fno-loop-strip-mine -fno-loop-interchange
-fno-loop-block"

		Trunk0	Trunk1	Gr0	Gr1
ac.f90	   	30	30	29	29
aermod.f90	151	110	147	147
air.f90		4	3	4	4
capacita.f90	17	11	13	13
channel.f90	15	14	14	14
doduc.f90	155	146	155	155
fatigue.f90	15	15	15	15
gas_dyn.f90	44	42	41	41
induct.f90	9	5	5	5
linpk.f90	14	3	14	14
mdbx.f90	12	8	12	12
nf.f90		51	34	50	50
protein.f90	31	31	31	31
rnflow.f90	87	75	85	85
test_fpu.f90	80	65	78	78
tfft.f90	4	3	4	4

Overall, with the recent changes that I pushed to the Graphite branch
and that should be stable by now, we improved the vectorization of
loops generated by Graphite.

The improvements in today's Graphite branch Gr1 with respect to
Trunk1, that is trunk with -fgraphite-identity are the difference
between Gr1 and Trunk1 (higher is more loops vectorized by Gr1):

ac.f90		-1
aermod.f90	37
air.f90		1
capacita.f90	2
channel.f90	0
doduc.f90	9
fatigue.f90	0
gas_dyn.f90	-1
induct.f90	0
linpk.f90	11
mdbx.f90	4
nf.f90		16
protein.f90	0
rnflow.f90	10
test_fpu.f90	13
tfft.f90	1

There still are some missed vectorization cases, see the difference
between Trunk0 and Gr0:

ac.f90		1
aermod.f90	4
air.f90		0
capacita.f90	4
channel.f90	1
doduc.f90	0
fatigue.f90	0
gas_dyn.f90	3
induct.f90	4
linpk.f90	0
mdbx.f90	0
nf.f90		1
protein.f90	0
rnflow.f90	2
test_fpu.f90	2
tfft.f90	0

After these changes are merged to trunk, we should revisit the
following PRs:

http://gcc.gnu.org/PR38846: 35% slower using -floop* than without graphite
http://gcc.gnu.org/PR40979: induct benchmark 60% slower when compiled
with -fgraphite
http://gcc.gnu.org/PR43359: gas_dyn benchmark exhibits missed
vectorization with graphite

Sebastian Pop
--
AMD / Open Source Compiler Engineering / GNU Tools


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]