This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [patch] Ping: loop distribution for single nested loops
- From: "Sebastian Pop" <sebpop at gmail dot com>
- To: "Andi Kleen" <andi at firstfloor dot org>
- Cc: "GCC Patches" <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 5 Mar 2008 06:01:50 -0600
- Subject: Re: [patch] Ping: loop distribution for single nested loops
- References: <cb9d34b20802271112t790e6f5cx7543712dfff7b4e5@mail.gmail.com> <87mypigepl.fsf@basil.nowhere.org>
On 02 Mar 2008 01:19:18 +0100, Andi Kleen <andi@firstfloor.org> wrote:
> "Sebastian Pop" <sebpop@gmail.com> writes:
>
> > +@item -ftree-loop-distribution
> > +Perform loop distribution. This flag can improve cache performance on
> > +big loop bodies and allow further loop optimizations, like
> > +parallelization or vectorization, to take place.
> > +
>
> Very brief
>
> > +/* This pass performs loop distribution: for example, the loop
> > +
> > + |DO I = 2, N
> > + | A(I) = B(I) + C
> > + | D(I) = A(I-1)*E
> > + |ENDDO
> > +
> > + is transformed to
> > +
> > + |DOALL I = 2, N
> > + | A(I) = B(I) + C
> > + |ENDDO
> > + |
> > + |DOALL I = 2, N
> > + | D(I) = A(I-1)*E
> > + |ENDDO
>
> It would be nice if this example was in the info file as part of the
> flag description so that normal users can figure out what the
> flag actually does.
>
I don't like this example, as I had to tune the ldist code for
sequential machines, and this particular example does not happen
anymore... The reason is that it is better to keep the data for A in
the cache, so it is better to keep the code in the same loop.
Here is a patch that improves the documentation with an examples that
should still be distributed:
Index: invoke.texi
===================================================================
--- invoke.texi (revision 132834)
+++ invoke.texi (working copy)
@@ -5932,7 +5932,22 @@ is used for debugging the data dependenc
@item -ftree-loop-distribution
Perform loop distribution. This flag can improve cache performance on
big loop bodies and allow further loop optimizations, like
-parallelization or vectorization, to take place.
+parallelization or vectorization, to take place. For example, the loop
+@smallexample
+DO I = 1, N
+ A(I) = B(I) + C
+ D(I) = E(I) * F
+ENDDO
+@end smallexample
+is transformed to
+@smallexample
+DO I = 1, N
+ A(I) = B(I) + C
+ENDDO
+DO I = 2, N
+ D(I) = E(I) * F
+ENDDO
+@end smallexample
@item -ftree-loop-im
@opindex ftree-loop-im
Thanks for reviewing,
Sebastian
--
AMD - GNU Tools