This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: IVOPT improvement patch


On 05/11/2010 08:35 AM, Xinliang David Li wrote:

Hi, IVOPT has been one of the main area of complaints from gcc users
and it is often shutdown or user is forced to use inline assembly to
write key kernel loops. The following (resulting from the
investigation of many user complaints) summarize some of the key
problems:

6) IN MEM_REF creation, loop variant and invariants may be assigned to
the same part -- which is essentially a re-association blocking LIM

On the other hand, some recombination of induction variables is necessary to prevent excessive register pressure (and the resulting spills).


From my slides at the May, 1999 Linux Expo:

"Let's turn our attention to the kinetic energy loop again:
\begin{verbatim}
      DO 810 I=ILONP2,ILNLT
         ZEK(I) = 0.25 *
     +        ( ( PUZ(I-1   ,K)*PUZ(I-1   ,K)
     +                 *HYU(I-1   )
     +          + PUZ(I     ,K)*PUZ(I     ,K)
     +                 *HYU(I     ))*RHYV (I)
     +        + ( PVZ(I-ILON,K)*PVZ(I-ILON,K)
     +                 *HXV(I-ILON)
     +          + PVZ(I     ,K)*PVZ(I     ,K)
     +                 *HXV(I     ))*RHXU (I) )
 810  CONTINUE
\end{verbatim}
If we strength reduce all induction variables and move
all loop invariant code out of the loop, we need
11 registers to hold the addresses needed to step through
the arrays.
\end{slide}
\begin{slide}{}
We can do better, by noting that
\begin{verbatim}
{ PUZ(I-1   ,K), PUZ(I     ,K) }
{ PVZ(I-ILON,K), PVZ(I     ,K) }
{ HYU(I-1   )  , HYU(I     )   }
{ HXV(I-ILON)  , HXV(I     )   }
\end{verbatim}
form 4 {\em equivalence classes} of induction variables
that differ only by a constant - which means they
can be written in the form of address-register-with-offset.

In doing so, we save 4 registers and need only 7 registers
for addressing.

Richard Henderson implemented this optimization, which
will be part of egcs-1.2 ... I mean gcc-2.95."

That was '99, so it was discussing code compiled by g77 and using RTL optimization passes only - but the idea is the same.

Kind regards,

--
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html#Fortran


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]