This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Improvement of vectorization on loops generated by Graphite


On Tue, Jul 27, 2010 at 06:47:53PM -0500, Sebastian Pop wrote:
> Hi,
> 
> I ran the following script to gather data with trunk (from 20100615)
> and Graphite branch (today).
> 
> for i in `ls -1 *.f90`; do
>     echo -n $i
>     $FC $OPT -c ./$i &> out
>     grep "LOOP VECTORIZED" out | wc
> done
> 
> The following columns correspond to the number of lines reported by wc.
> 
> Trunk0: OPT="-ftree-vectorizer-verbose=2 -O3 -ffast-math"
> Trunk1: OPT="-ftree-vectorizer-verbose=2 -O3 -ffast-math -fgraphite-identity"
> Gr0: OPT="-ftree-vectorizer-verbose=2 -O3 -ffast-math"
> Gr1: OPT="-ftree-vectorizer-verbose=2 -O3 -ffast-math
> -fgraphite-identity -fno-loop-strip-mine -fno-loop-interchange
> -fno-loop-block"
> 
> 		Trunk0	Trunk1	Gr0	Gr1
> ac.f90	   	30	30	29	29
> aermod.f90	151	110	147	147
> air.f90		4	3	4	4
> capacita.f90	17	11	13	13
> channel.f90	15	14	14	14
> doduc.f90	155	146	155	155
> fatigue.f90	15	15	15	15
> gas_dyn.f90	44	42	41	41
> induct.f90	9	5	5	5
> linpk.f90	14	3	14	14
> mdbx.f90	12	8	12	12
> nf.f90		51	34	50	50
> protein.f90	31	31	31	31
> rnflow.f90	87	75	85	85
> test_fpu.f90	80	65	78	78
> tfft.f90	4	3	4	4
> 
> Overall, with the recent changes that I pushed to the Graphite branch
> and that should be stable by now, we improved the vectorization of
> loops generated by Graphite.
> 
> The improvements in today's Graphite branch Gr1 with respect to
> Trunk1, that is trunk with -fgraphite-identity are the difference
> between Gr1 and Trunk1 (higher is more loops vectorized by Gr1):
> 
> ac.f90		-1
> aermod.f90	37
> air.f90		1
> capacita.f90	2
> channel.f90	0
> doduc.f90	9
> fatigue.f90	0
> gas_dyn.f90	-1
> induct.f90	0
> linpk.f90	11
> mdbx.f90	4
> nf.f90		16
> protein.f90	0
> rnflow.f90	10
> test_fpu.f90	13
> tfft.f90	1
> 
> There still are some missed vectorization cases, see the difference
> between Trunk0 and Gr0:
> 
> ac.f90		1
> aermod.f90	4
> air.f90		0
> capacita.f90	4
> channel.f90	1
> doduc.f90	0
> fatigue.f90	0
> gas_dyn.f90	3
> induct.f90	4
> linpk.f90	0
> mdbx.f90	0
> nf.f90		1
> protein.f90	0
> rnflow.f90	2
> test_fpu.f90	2
> tfft.f90	0
> 

Sebastian,
    When do you think we may start to see the vectorizations in
Gr1 exceed those from Gr0? Will that required upgrading to the
newer cloog?
            Jack
ps If the vectorizations using -fgraphite-identity eventually reaches
parity with those without that option, would -fgraphite-identity
become defaulted on for gcc builds with graphite support
(assuming minimal compile time increases)?

> After these changes are merged to trunk, we should revisit the
> following PRs:
> 
> http://gcc.gnu.org/PR38846: 35% slower using -floop* than without graphite
> http://gcc.gnu.org/PR40979: induct benchmark 60% slower when compiled
> with -fgraphite
> http://gcc.gnu.org/PR43359: gas_dyn benchmark exhibits missed
> vectorization with graphite
> 
> Sebastian Pop
> --
> AMD / Open Source Compiler Engineering / GNU Tools


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]