This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Graphite middle-end parts review


On Wed, Aug 13, 2008 at 4:08 PM, Richard Guenther <rguenther@suse.de> wrote:
> Sure, an example would be nice - but the suggested change is a step
> forward, thanks for doing it.
>

Here is the patch for updating the documentation with examples for
these flags.

Sebastian Pop
--
AMD - GNU Tools
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 139047)
+++ doc/invoke.texi	(working copy)
@@ -5942,20 +5942,80 @@ at @option{-O} and higher.
 Perform linear loop transformations on tree.  This flag can improve cache
 performance and allow further loop optimizations to take place.
 
-@item -floop-block
-Perform loop blocking transformations on loops.  Blocking strip mines
-each loop in the loop nest such that the memory accesses of the
-element loops fit inside the L1 cache.
+@item -floop-interchange
+Perform loop interchange transformations on loops.  Interchanging two
+nested loops switches the inner and outer loops.  For example, given a
+loop like:
+@smallexample
+DO J = 1, M, 1
+  DO I = 1, N, 1
+    A(J, I) = A(J, I) * C
+  ENDDO
+ENDDO
+@end smallexample
+loop interchange will transform the loop as if the user had written:
+@smallexample
+DO I = 1, N, 1
+  DO J = 1, M, 1
+    A(J, I) = A(J, I) * C
+  ENDDO
+ENDDO
+@end smallexample
+which can be beneficial when @code{N} is larger than the caches,
+because in Fortran, the elements of an array are stored in memory
+contiguously by column, and the original loop iterates over rows,
+potentially creating at each access a cache miss.  This optimization
+applies to all the languages supported by GCC and is not limited to
+Fortran.
 
 @item -floop-strip-mine
 Perform loop strip mining transformations on loops.  Strip mining
 splits a loop into two nested loops.  The outer loop has strides 
 equal to the strip size and the inner loop has strides of the 
-original loop within a strip.
+original loop within a strip.  For example, given a loop like:
+@smallexample
+DO I = 1, N, 1
+  A(I) = A(I) + C
+ENDDO
+@end smallexample
+loop strip mining will transform the loop as if the user had written:
+@smallexample
+DO II = 1, N, 4
+  DO I = II, min (II + 4, N), 1
+    A(I) = A(I) + C
+  ENDDO
+ENDDO
+@end smallexample
+This optimization applies to all the languages supported by GCC and is
+not limited to Fortran.
 
-@item -floop-interchange
-Perform loop interchange transformations on loops.  Interchanging
-two nested loops switches the inner and outer loops. 
+@item -floop-block
+Perform loop blocking transformations on loops.  Blocking strip mines
+each loop in the loop nest such that the memory accesses of the
+element loops fit inside caches.  For example, given a loop like:
+@smallexample
+DO I = 1, N, 1
+  DO J = 1, M, 1
+    A(J, I) = B(I) + C(J)
+  ENDDO
+ENDDO
+@end smallexample
+loop blocking will transform the loop as if the user had written:
+@smallexample
+DO II = 1, N, 64
+  DO JJ = 1, M, 64
+    DO I = II, min (II + 64, N), 1
+      DO J = JJ, min (JJ + 64, M), 1
+        A(J, I) = B(I) + C(J)
+      ENDDO
+    ENDDO
+  ENDDO
+ENDDO
+@end smallexample
+which can be beneficial when @code{M} is larger than the caches,
+because the innermost loop will iterate over a smaller amount of data
+that can be kept in the caches.  This optimization applies to all the
+languages supported by GCC and is not limited to Fortran.
 
 @item -fcheck-data-deps
 @opindex fcheck-data-deps

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]