This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
For after egcs-2.0, when everything works and we get bored
- To: egcs at cygnus dot com
- Subject: For after egcs-2.0, when everything works and we get bored
- From: Toon Moene <toon at moene dot indiv dot nluug dot nl>
- Date: Fri, 31 Jul 98 12:17:04 +0200
- Organization: Moene Computational Physics, Maartensdijk, The Netherlands
Lectoribus Salutem,
Consider the following "perfect" loop nest:
subroutine sub(a, b, c, n, m)
real a(n, m), b(n, m), c(n, m)
do j = 1, m
do i = 1, n
c(i, j) = a(i, j) + b(i, j)
enddo
enddo
end
The loop nest is called "perfect" because all code resides in the
inner loop.
As it is now, egcs does a good job optimising it - even on the
Alpha all induction variables are recognised and reduced or
eliminated - for the *inner* loop.
However, as all Real Programmers know, the fastest way to do the
above is:
do k = 1, m*n
c(k, 1) = a(k, 1) + b(k, 1)
enddo
because the loop nest does nothing else than traverse the arrays in
memory layout order.
Unfortunately, there are several problems with the second rendition
of the loop nest:
1. The Standard doesn't guarantee that it'll work on all
implementations (think distributed memory systems with
- invisible - gather and scatter).
2. It defeats bounds checking (I personnally repaired several
thousands of lines in our own code to enable bounds checking
on it).
3. It gets messy when the loop bounds are not simply 1 and n.
Nevertheless, on the architectures supported by egcs, one would
like the loop nest being transformed by the compiler into the single
loop, because:
1. loop_optimise performs the best work on the innermost loop - hence
by corollary, you want to have only one loop.
2. You get rid of an extraneous set of loop test and branch code.
So, the 64K question is: How do we transform the loop nest into
the single loop *in the optimisation passes* [and not in the Fortran
source code]. I will not deal with this question in its entirety,
because I do not know all steps necessary, but I will outline the
checks that are necessary on such a loop nest to be able to tell
whether it *can* be done.
First, I'll transform the loop nest to its "RTL" equivalent (but
coded in Fortran):
jc = m ! Outer loop count
j = 1 ! Outer loop induction var j
if (jc .eq. 0) goto 10 ! End of outer loop test
1 continue ! Outer loop entry point
ic = n ! Inner loop count
i = 1 ! Inner loop induction var i
if (ic .eq. 0) goto 20 ! End of inner loop test
2 continue ! Inner loop entry point
a_a = addr_of_a+4*i+4*(j-1)*dim_n ! Get address of a element
a_b = addr_of_b+4*i+4*(j-1)*dim_n ! Get address of b element
a_c = addr_of_c+4*i+4*(j-1)*dim_n ! Get address of c element
a_c = a_a + a_b ! Compute sum in c element
i=i+1 ! Up induction var i
ic=ic-1 ! One fewer for inner loop
if (ic .ne. 0) goto 2 ! Repeat inner if not done
20 continue
j=j+1 ! Up induction var j
jc=jc-1 ! One fewer for outer loop
if (jc .ne. 0) goto 1 ! Repeat outer if not done
10 continue
Why can we collapse these loops ?
The answer is simple: Because all occurrences of inner and outer
loop induction variables are in expressions of the form
4 * (i + (j-1) * dim_n) ! Where dim_n = n
The reason this works is that 1 + n + (j-1) * dim_n = 1 + (j-1 + 1)
* n, or - in words - such an expression will form in itself an
induction expression k which runs from 1 to n*m.
In general, if you have two perfectly nested loops and the only
expressions in which induction variables from the inner loop (i) and
the outer loop (j) enter are of the form:
inv1 * i + inv1 * (j + inv2) * n + inv3
(where n is the loop count of the inner loop) then this expression
will form an induction expression iff the increment of i is equal to
that of j:
ib + is * n + (j+inv2) * n = ib + (j+inv2 + js) * n
iff is = js.
([ij]b is begin value of i orj, [ij]s its "step", inv[123] are loop
invariants)
In other words, two perfectly nested loops can be collapsed iff
induction variables i of the inner loop and j of the outer loop are
only encountered in expressions that combine them in the following
way:
inv1 * i + inv1 * (j + inv2) * n + inv3
which can be collapsed into an induction variable of the single
loop (with loop count n*m) k with kb = ib + (jb + inv2) * n and ks =
is = js.
There are a few caveats:
1. It is probably a lot of work to gather all the compound
expressions of all induction variables involved; on the
other hand, there is no bound to the number of k's that
can be "created" to stand in for those expressions.
2. "Checking" that all expressions of i and j are of the
necessary kind is probably non-trivial either; don't
forget that the single increment of both induction
variables shouldn't be subjected to this test !
3. I used "dim_n" in the above expansion advisedly, because
- although in the Fortran text of the loop it's perfectly
clear that the inner loop count is n as well as the extent
of the first dimension of the arrays - by the time you
hit the RTL representation of said loops this won't be
that clear anymore. If it's indeterminable, we're back
at square one.
4. Obviously, you need to know a lot about loop invariants and
induction variables - things loop_optimise will find out for
you. Unfortunately, the transformation has the most effect
if done *before* loop optimisation ;-)
5. I wouldn't be surprised if this transformation was already
known in the literature and another word than "collapse"
was choosen for it ...
I wish you a lot of looong and cozy winter evenings playing with
this puzzle :-)
Regards,
--
Toon Moene (mailto:toon@moene.indiv.nluug.nl)
Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
Phone: +31 346 214290; Fax: +31 346 214286
g77 Support: mailto:fortran@gnu.org; NWP: http://www.knmi.nl/hirlam
"As all Real Programmers know, the only useful data structure is
the Array. Strings, lists, structures, sets - these are all special
cases of arrays and can be treated that way just as easily without
messing up your programming language with all sorts of
complications."