This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: IVOPT improvement patch


On 05/11/2010 07:49 PM, Xinliang David Li wrote:

On Tue, May 11, 2010 at 10:18 AM, Toon Moene<toon@moene.org> wrote:

On 05/11/2010 08:35 AM, Xinliang David Li wrote:

Hi, IVOPT has been one of the main area of complaints from gcc users
and it is often shutdown or user is forced to use inline assembly to
write key kernel loops. The following (resulting from the
investigation of many user complaints) summarize some of the key
problems:

6) IN MEM_REF creation, loop variant and invariants may be assigned to
the same part -- which is essentially a re-association blocking LIM

On the other hand, some recombination of induction variables is necessary to prevent excessive register pressure (and the resulting spills).

From my slides at the May, 1999 Linux Expo:

"Let's turn our attention to the kinetic energy loop again:
\begin{verbatim}
      DO 810 I=ILONP2,ILNLT
         ZEK(I) = 0.25 *
     +        ( ( PUZ(I-1   ,K)*PUZ(I-1   ,K)
     +                 *HYU(I-1   )
     +          + PUZ(I     ,K)*PUZ(I     ,K)
     +                 *HYU(I     ))*RHYV (I)
     +        + ( PVZ(I-ILON,K)*PVZ(I-ILON,K)
     +                 *HXV(I-ILON)
     +          + PVZ(I     ,K)*PVZ(I     ,K)
     +                 *HXV(I     ))*RHXU (I) )
  810  CONTINUE
\end{verbatim}
If we strength reduce all induction variables and move
all loop invariant code out of the loop, we need
11 registers to hold the addresses needed to step through
the arrays.
\end{slide}
\begin{slide}{}
We can do better, by noting that
\begin{verbatim}
{ PUZ(I-1   ,K), PUZ(I     ,K) }
{ PVZ(I-ILON,K), PVZ(I     ,K) }
{ HYU(I-1   )  , HYU(I     )   }
{ HXV(I-ILON)  , HXV(I     )   }
\end{verbatim}
form 4 {\em equivalence classes} of induction variables
that differ only by a constant - which means they
can be written in the form of address-register-with-offset.


Finding the optimal partition and iv selection/assignment is the core part of the IVOPT. GCC IVOPT uses a cost based approach to achieve that, however there are some issues that prevent the above optimal solution above. One of the intention of this patch is to address it.

Thanks !


The problem described in 6 is different. Assuming we have the
following linear address expressions:

base + iv + inv*coeff,  where iv is the induction variable, and inv is
loop invariant and coeff is constant 2/4/8, the synthesized mem ref
(without the fix) can be:

MEM_REF(base + iv, inv, coeff)

With the fix,

MEM_REF(base + inv*coeff, iv, 1)

Where base+inv*coeff can be later hoisted out of the loop.

Of course the added cost is the increased register pressure -- but
this is modelled by the so call pseudo invariant.

Does this also take care of induction variables of the form of:


inv * iv + inv + const

?

This was one we had to "deal with" in the second half of the '90s, as a result of Fortran arrays being (by default) 1-based instead of 0-based.

However, it might be that the gfortran front end presents this problem differently to the middle end than g77 presented it to the RTL passes.

Kind regards,

--
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html#Fortran


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]