This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On Tue, May 11, 2010 at 10:18 AM, Toon Moene<toon@moene.org> wrote:
On 05/11/2010 08:35 AM, Xinliang David Li wrote:
Hi, IVOPT has been one of the main area of complaints from gcc users and it is often shutdown or user is forced to use inline assembly to write key kernel loops. The following (resulting from the investigation of many user complaints) summarize some of the key problems:
6) IN MEM_REF creation, loop variant and invariants may be assigned to the same part -- which is essentially a re-association blocking LIM
On the other hand, some recombination of induction variables is necessary to prevent excessive register pressure (and the resulting spills).
From my slides at the May, 1999 Linux Expo:
"Let's turn our attention to the kinetic energy loop again: \begin{verbatim} DO 810 I=ILONP2,ILNLT ZEK(I) = 0.25 * + ( ( PUZ(I-1 ,K)*PUZ(I-1 ,K) + *HYU(I-1 ) + + PUZ(I ,K)*PUZ(I ,K) + *HYU(I ))*RHYV (I) + + ( PVZ(I-ILON,K)*PVZ(I-ILON,K) + *HXV(I-ILON) + + PVZ(I ,K)*PVZ(I ,K) + *HXV(I ))*RHXU (I) ) 810 CONTINUE \end{verbatim} If we strength reduce all induction variables and move all loop invariant code out of the loop, we need 11 registers to hold the addresses needed to step through the arrays. \end{slide} \begin{slide}{} We can do better, by noting that \begin{verbatim} { PUZ(I-1 ,K), PUZ(I ,K) } { PVZ(I-ILON,K), PVZ(I ,K) } { HYU(I-1 ) , HYU(I ) } { HXV(I-ILON) , HXV(I ) } \end{verbatim} form 4 {\em equivalence classes} of induction variables that differ only by a constant - which means they can be written in the form of address-register-with-offset.
Finding the optimal partition and iv selection/assignment is the core part of the IVOPT. GCC IVOPT uses a cost based approach to achieve that, however there are some issues that prevent the above optimal solution above. One of the intention of this patch is to address it.
The problem described in 6 is different. Assuming we have the following linear address expressions:
base + iv + inv*coeff, where iv is the induction variable, and inv is loop invariant and coeff is constant 2/4/8, the synthesized mem ref (without the fix) can be:
MEM_REF(base + iv, inv, coeff)
With the fix,
MEM_REF(base + inv*coeff, iv, 1)
Where base+inv*coeff can be later hoisted out of the loop.
Of course the added cost is the increased register pressure -- but this is modelled by the so call pseudo invariant.
-- Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/ Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html#Fortran
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |