This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

For after egcs-2.0, when everything works and we get bored


Lectoribus Salutem,

Consider the following "perfect" loop nest:

      subroutine sub(a, b, c, n, m)
      real a(n, m), b(n, m), c(n, m)
      do j = 1, m
         do i = 1, n
            c(i, j) = a(i, j) + b(i, j)
         enddo
      enddo
      end

The loop nest is called "perfect" because all code resides in the  
inner loop.

As it is now, egcs does a good job optimising it - even on the  
Alpha all induction variables are recognised and reduced or  
eliminated - for the *inner* loop.

However, as all Real Programmers know, the fastest way to do the  
above is:

      do k = 1, m*n
         c(k, 1) = a(k, 1) + b(k, 1)
      enddo

because the loop nest does nothing else than traverse the arrays in  
memory layout order.

Unfortunately, there are several problems with the second rendition  
of the loop nest:

1. The Standard doesn't guarantee that it'll work on all
   implementations (think distributed memory systems with
   - invisible - gather and scatter).

2. It defeats bounds checking (I personnally repaired several
   thousands of lines in our own code to enable bounds checking
   on it).

3. It gets messy when the loop bounds are not simply 1 and n.

Nevertheless, on the architectures supported by egcs, one would  
like the loop nest being transformed by the compiler into the single  
loop, because:

1. loop_optimise performs the best work on the innermost loop - hence
   by corollary, you want to have only one loop.

2. You get rid of an extraneous set of loop test and branch code.

So, the 64K question is:  How do we transform the loop nest into  
the single loop *in the optimisation passes* [and not in the Fortran  
source code].  I will not deal with this question in its entirety,  
because I do not know all steps necessary, but I will outline the  
checks that are necessary on such a loop nest to be able to tell  
whether it *can* be done.

First, I'll transform the loop nest to its "RTL" equivalent (but  
coded in Fortran):

      jc = m				! Outer loop count
      j  = 1				! Outer loop induction var j
      if (jc .eq. 0) goto 10		! End of outer loop test
    1 continue				! Outer loop entry point
      ic = n				! Inner loop count
      i  = 1				! Inner loop induction var i
      if (ic .eq. 0) goto 20		! End of inner loop test
    2 continue				! Inner loop entry point
      a_a = addr_of_a+4*i+4*(j-1)*dim_n	! Get address of a element
      a_b = addr_of_b+4*i+4*(j-1)*dim_n	! Get address of b element
      a_c = addr_of_c+4*i+4*(j-1)*dim_n	! Get address of c element
      a_c = a_a + a_b			! Compute sum in c element
      i=i+1				! Up induction var i
      ic=ic-1				! One fewer for inner loop
      if (ic .ne. 0) goto 2		! Repeat inner if not done
   20 continue
      j=j+1				! Up induction var j
      jc=jc-1				! One fewer for outer loop
      if (jc .ne. 0) goto 1		! Repeat outer if not done
   10 continue

Why can we collapse these loops ?

The answer is simple:  Because all occurrences of inner and outer  
loop induction variables are in expressions of the form

4 * (i + (j-1) * dim_n)		! Where dim_n = n

The reason this works is that 1 + n + (j-1) * dim_n = 1 + (j-1 + 1)  
* n, or - in words - such an expression will form in itself an  
induction expression k which runs from 1 to n*m.

In general, if you have two perfectly nested loops and the only  
expressions in which induction variables from the inner loop (i) and  
the outer loop (j) enter are of the form:

inv1 * i + inv1 * (j + inv2) * n + inv3

(where n is the loop count of the inner loop) then this expression  
will form an induction expression iff the increment of i is equal to  
that of j:

ib + is * n + (j+inv2) * n = ib + (j+inv2 + js) * n

iff is = js.

([ij]b is begin value of i orj, [ij]s its "step", inv[123] are loop  
invariants)

In other words, two perfectly nested loops can be collapsed iff  
induction variables i of the inner loop and j of the outer loop are  
only encountered in expressions that combine them in the following  
way:

inv1 * i + inv1 * (j + inv2) * n + inv3

which can be collapsed into an induction variable of the single  
loop (with loop count n*m) k with kb = ib + (jb + inv2) * n and ks =  
is = js.

There are a few caveats:

1. It is probably a lot of work to gather all the compound
   expressions of all induction variables involved; on the
   other hand, there is no bound to the number of k's that
   can be "created" to stand in for those expressions.

2. "Checking" that all expressions of i and j are of the
   necessary kind is probably non-trivial either; don't
   forget that the single increment of both induction
   variables shouldn't be subjected to this test !

3. I used "dim_n" in the above expansion advisedly, because
   - although in the Fortran text of the loop it's perfectly
   clear that the inner loop count is n as well as the extent
   of the first dimension of the arrays - by the time you
   hit the RTL representation of said loops this won't be
   that clear anymore.  If it's indeterminable, we're back
   at square one.

4. Obviously, you need to know a lot about loop invariants and
   induction variables - things loop_optimise will find out for
   you.  Unfortunately, the transformation has the most effect
   if done *before* loop optimisation ;-)

5. I wouldn't be surprised if this transformation was already
   known in the literature and another word than "collapse"
   was choosen for it ...

I wish you a lot of looong and cozy winter evenings playing with  
this puzzle :-)

Regards,
--
Toon Moene (mailto:toon@moene.indiv.nluug.nl)
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
Phone: +31 346 214290; Fax: +31 346 214286
g77 Support: mailto:fortran@gnu.org; NWP: http://www.knmi.nl/hirlam

"As all Real Programmers know, the only useful data structure is  
the Array.  Strings, lists, structures, sets - these are all special  
cases of arrays and can be treated that way just as easily without  
messing up your programming language with all sorts of  
complications."


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]