This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Proposal for the coarse grain unrolling heuristics and renaming for the enablement of better fine grain Loop transformation.
- From: Mikhail Maltsev <maltsevm at gmail dot com>
- To: Ajit Kumar Agarwal <ajit dot kumar dot agarwal at xilinx dot com>, Jeff Law <law at redhat dot com>, Richard Biener <richard dot guenther at gmail dot com>, Jan Hubicka <hubicka at ucw dot cz>, "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>
- Cc: Vinod Kathail <vinodk at xilinx dot com>, Shail Aditya Gupta <shailadi at xilinx dot com>, Vidhumouli Hunsigida <vidhum at xilinx dot com>, Nagaraju Mekala <nmekala at xilinx dot com>
- Date: Sun, 15 Mar 2015 21:56:47 +0300
- Subject: Re: Proposal for the coarse grain unrolling heuristics and renaming for the enablement of better fine grain Loop transformation.
- Authentication-results: sourceware.org; auth=none
- References: <BL2FFO11FD015F9334CEA71679E1EC3E1AE050 at BL2FFO11FD015 dot protection dot gbl>
15.03.2015 17:44, Ajit Kumar Agarwal writes:
>
> Hello All:
>
> The below example is taken from the article by Albert Cohen et.al.
>
> "Deep Jam : Conversion of Coarse grain Parallelism to Fine grain vector parallelism"
It is not clear from this article, whether such optimization can be
implemented in practice (i.e. automatically and within reasonable time).
Authors implemented some proof-of-concept, but from their description it
looks like they had to tweak their algorithm in each particular case
(and they hope, that later this could be automated using FDO). Also
nothing definite is said about compilation time.
> I would like to propose the above heuristics for unroll and jam and renaming which enables the loop fusion and the IF-merging
> to achieve the optimize code.
AFAIK, most of loop transformations in GCC are applicable to rather
simple loops, which, at least have an induction variable. So, loops like
while (q)
{
...
}
for some arbitrary condition q, e.g.
while (*a++ = *b++)
;
are unlikely to be optimized well. The proposed transformation
> For ( I = 0 ; I < 10; i++)
> a1 = phi(a1,a2);
> If ( p1 && p2)
> a1 = ...;
> a2 = ...;
> Else
> If (p1)
> a1= ....;
> Else
> If (p2)
> a2 = ...;
>
> While (q1 && q2)
> {
> ... = a1 + ...;
> ... = a2 + .....;
> }
> While (q1)
> ....= a1 + ...;
> While ( q2)
> ..... = a2 + ..;
involves some sort of unroll-and-jam, nontrival code motion, loop
fusion. And it obviously requires analysis of loop-carried data
dependence, that can deal with nested loops having complex conditions
inside them, which GCC does not yet provide.
GCC includes Graphite framework - it allows to perform rather complex
transformations of loops (perhaps similar to what you described), but it
only works for loops which have canonical induction variables and
involve only affine memory access (i.e. addresses must be represented as
linear combinations of induction variables and constants) - no if's, no
while's with non-trivial exit conditions.
So, I think this idea is, unfortunately, not very realistic in terms of
required efforts.
P.S. I'm new in GCC community, but I hope that in general terms my
description is close to reality. Please correct me, if something is wrong.
--
Regards,
Mikhail Maltsev