This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: GVN-PRE sucks -- well, for SPECCFP2000 mgrid anyway


Steven Bosscher wrote:

Hi,

SPEC2000's mgrid benchmark has typical code fortran that looks
like this:

      DO I3=2,N-1
        DO I2=2,N-1
          DO I1=2,N-1
            R(I1,I2,I3) = V(I1,I2,I3)
     >      -A(0)*( U(I1,  I2,  I3  ) )
     >      -A(1)*( U(I1-1,I2,  I3  ) + U(I1+1,I2,  I3  )
     >                 +  U(I1,  I2-1,I3  ) + U(I1,  I2+1,I3  )
     >                 +  U(I1,  I2,  I3-1) + U(I1,  I2,  I3+1) )
     >      -A(2)*( U(I1-1,I2-1,I3  ) + U(I1+1,I2-1,I3  )
     >                 +  U(I1-1,I2+1,I3  ) + U(I1+1,I2+1,I3  )
     >                 +  U(I1,  I2-1,I3-1) + U(I1,  I2+1,I3-1)
     >                 +  U(I1,  I2-1,I3+1) + U(I1,  I2+1,I3+1)
     >                 +  U(I1-1,I2,  I3-1) + U(I1-1,I2,  I3+1)
     >                 +  U(I1+1,I2,  I3-1) + U(I1+1,I2,  I3+1) )
     >      -A(3)*( U(I1-1,I2-1,I3-1) + U(I1+1,I2-1,I3-1)
     >                 +  U(I1-1,I2+1,I3-1) + U(I1+1,I2+1,I3-1)
     >                 +  U(I1-1,I2-1,I3+1) + U(I1+1,I2-1,I3+1)
     >                 +  U(I1-1,I2+1,I3+1) + U(I1+1,I2+1,I3+1) )
          END DO
        END DO
      END DO

where R,V,A, and U are function arguments and I[123] are loop
counters.

When gfortran flattens the arrays it exposes a lot of redundancies
for address arithmetic, most of which is loop invariant.  So PRE
gets to work, moves everything it can out of the loop, and we get:

[ A huge mess. ]


Note that g77 doesn't flatten arrays and we still move a whole lot of the address computations out of the (inner) loop.

No, that's not the problem. The problem is that the tree-ssa loop optimizer doesn't have induction variable strength reduction and elimination (it's present on the lno branch, but not integrated in mainline yet).

Note the following analysis:

On ix86 variants, which have, among others (reg+offset) addressing, only eleven (integer) registers are necessary to hold addresses and the loop counter in the above inner loop:

 1: A
 2: U(n, I2  , I3  )
 3: U(n, I2+1, I3+1)
 4: U(n, I2  , I3+1)
 5: U(n, I2+1, I3  )
 6: U(n, I2-1, I3  )
 7: U(n, I2-1, I3+1)
 8: U(n, I2-1, I3-1)
 9: V(n, I2,   I3  )
10: R(n, I2,   I3  )
11: I1

The AMD64 has 16 integer registers, enough to cause no spillage on this loop.

Hope this helps,

--
Toon Moene - mailto:toon@moene.indiv.nluug.nl - phoneto: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
Maintainer, GNU Fortran 77: http://gcc.gnu.org/onlinedocs/g77_news.html
A maintainer of GNU Fortran 95: http://gcc.gnu.org/fortran/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]