This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Graphite middle-end parts review

From: "Sebastian Pop" <sebastian dot pop at amd dot com>
To: "Richard Guenther" <rguenther at suse dot de>
Cc: "Mark Mitchell" <mark at codesourcery dot com>, gcc-patches at gcc dot gnu dot org, "Jan Sjodin" <jan dot sjodin at amd dot com>, "Jagasia, Harsha" <harsha dot jagasia at amd dot com>
Date: Thu, 14 Aug 2008 12:22:32 -0500
Subject: Re: Graphite middle-end parts review
References: <alpine.LNX.1.10.0808071411180.3427@zhemvz.fhfr.qr> <cb9d34b20808131157t34753219l635efb646c0a69b@mail.gmail.com> <48A33239.6020301@codesourcery.com> <alpine.LNX.1.10.0808132307420.3427@zhemvz.fhfr.qr>

On Wed, Aug 13, 2008 at 4:08 PM, Richard Guenther <rguenther@suse.de> wrote:
> Sure, an example would be nice - but the suggested change is a step
> forward, thanks for doing it.
>

Here is the patch for updating the documentation with examples for
these flags.

Sebastian Pop
--
AMD - GNU Tools

Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 139047)
+++ doc/invoke.texi	(working copy)
@@ -5942,20 +5942,80 @@ at @option{-O} and higher.
 Perform linear loop transformations on tree.  This flag can improve cache
 performance and allow further loop optimizations to take place.
 
-@item -floop-block
-Perform loop blocking transformations on loops.  Blocking strip mines
-each loop in the loop nest such that the memory accesses of the
-element loops fit inside the L1 cache.
+@item -floop-interchange
+Perform loop interchange transformations on loops.  Interchanging two
+nested loops switches the inner and outer loops.  For example, given a
+loop like:
+@smallexample
+DO J = 1, M, 1
+  DO I = 1, N, 1
+    A(J, I) = A(J, I) * C
+  ENDDO
+ENDDO
+@end smallexample
+loop interchange will transform the loop as if the user had written:
+@smallexample
+DO I = 1, N, 1
+  DO J = 1, M, 1
+    A(J, I) = A(J, I) * C
+  ENDDO
+ENDDO
+@end smallexample
+which can be beneficial when @code{N} is larger than the caches,
+because in Fortran, the elements of an array are stored in memory
+contiguously by column, and the original loop iterates over rows,
+potentially creating at each access a cache miss.  This optimization
+applies to all the languages supported by GCC and is not limited to
+Fortran.
 
 @item -floop-strip-mine
 Perform loop strip mining transformations on loops.  Strip mining
 splits a loop into two nested loops.  The outer loop has strides 
 equal to the strip size and the inner loop has strides of the 
-original loop within a strip.
+original loop within a strip.  For example, given a loop like:
+@smallexample
+DO I = 1, N, 1
+  A(I) = A(I) + C
+ENDDO
+@end smallexample
+loop strip mining will transform the loop as if the user had written:
+@smallexample
+DO II = 1, N, 4
+  DO I = II, min (II + 4, N), 1
+    A(I) = A(I) + C
+  ENDDO
+ENDDO
+@end smallexample
+This optimization applies to all the languages supported by GCC and is
+not limited to Fortran.
 
-@item -floop-interchange
-Perform loop interchange transformations on loops.  Interchanging
-two nested loops switches the inner and outer loops. 
+@item -floop-block
+Perform loop blocking transformations on loops.  Blocking strip mines
+each loop in the loop nest such that the memory accesses of the
+element loops fit inside caches.  For example, given a loop like:
+@smallexample
+DO I = 1, N, 1
+  DO J = 1, M, 1
+    A(J, I) = B(I) + C(J)
+  ENDDO
+ENDDO
+@end smallexample
+loop blocking will transform the loop as if the user had written:
+@smallexample
+DO II = 1, N, 64
+  DO JJ = 1, M, 64
+    DO I = II, min (II + 64, N), 1
+      DO J = JJ, min (JJ + 64, M), 1
+        A(J, I) = B(I) + C(J)
+      ENDDO
+    ENDDO
+  ENDDO
+ENDDO
+@end smallexample
+which can be beneficial when @code{M} is larger than the caches,
+because the innermost loop will iterate over a smaller amount of data
+that can be kept in the caches.  This optimization applies to all the
+languages supported by GCC and is not limited to Fortran.
 
 @item -fcheck-data-deps
 @opindex fcheck-data-deps

Follow-Ups:
- Re: Graphite middle-end parts review
  - From: Mark Mitchell

References:
- Graphite middle-end parts review
  - From: Richard Guenther
- Re: Graphite middle-end parts review
  - From: Sebastian Pop
- Re: Graphite middle-end parts review
  - From: Mark Mitchell
- Re: Graphite middle-end parts review
  - From: Richard Guenther

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]