This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: [testcase] simplify_binary_operation test (was Re: -freduce-all-givs and fortran)


Richard Henderson wrote:

> On Fri, Sep 07, 2001 at 08:55:25PM +0200, Toon Moene wrote:

> > I can foresee what you want to conclude from that, but g77 will keep
> > -freduce-all-givs.  This loop simply is too small to show the
> > difference.

> Well, more data for loops that do better with -freduce-all-givs
> would be helpful.

Hmmm, I thought I already explained why limiting the number of (general)
induction variables to be reduced is not optimal w.r.t. the average
Fortran code (by virtue of the way such code uses multi-rank array
offsets in loops).

Perhaps it's enlightning to explain that this "discussion" is already
six years old, as far as I am concerned.

Maybe it's instructive to attach the e-mail that got me into this (sent
to gcc2@cygnus.com d.d. 22nd of November 1995).

-- 
Toon Moene - mailto:toon@moene.indiv.nluug.nl - phoneto: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
Maintainer, GNU Fortran 77: http://gcc.gnu.org/onlinedocs/g77_news.html
Join GNU Fortran 95: http://g95.sourceforge.net/ (under construction)
>From - Fri Sep  7 23:18:43 2001
Return-Path: <sun4nl!cygnus.com!gcc2-request>
Received: by moene.indiv.nluug.nl (NX5.67e/NeXT-2.0)
	id AA06974; Wed, 22 Nov 95 18:46:00 +0100
Received: from cygnus.com by sun4nl.NL.net with SMTP
	id AA21339 (5.65b/CWI-3.3); Wed, 22 Nov 1995 17:27:47 +0100
Received: from sun4nl.NL.net (sun4nl.NL.net [193.78.240.1]) by cygnus.com (8.6.12/8.6.9) with SMTP id GAA28440 for <gcc2@cygnus.com>; Wed, 22 Nov 1995 06:34:48 -0800
Received: from moene by sun4nl.NL.net via EUnet
	id AA09110 (5.65b/CWI-3.3); Wed, 22 Nov 1995 15:34:36 +0100
Received: by moene.indiv.nluug.nl (NX5.67e/NeXT-2.0)
	id AA05237; Wed, 22 Nov 95 15:33:25 +0100
Message-Id: <9511221433.AA05237@moene.indiv.nluug.nl>
Content-Type: text/plain
Mime-Version: 1.0 (NeXT Mail 3.3 v118.2)
Received: by NeXT.Mailer (1.118.2)
From: Toon Moene <toon@moene.indiv.nluug.nl>
Date: Wed, 22 Nov 95 15:33:21 +0100
To: gcc2@cygnus.com
Subject: Loop optimisation: GIV strength reduction and Fortran

Hi All,

Last week I presented a piece of rather run-of-the-mill Fortran  
loop code with arrays that gets relatively poorly optimised because  
Fortran array lower bounds are by default 1 and not 0.  In the  
lower-bound==0 case, the address expressions for array elements are  
recognised (by simplify_giv_expr in loop.c) as being general  
induction variables.  The strength reduction performed on these  
registers results in the generation of post-increment addressing -  
hence the indexing in these arrays is essentially free.  When the  
lower bound is not 0, simplify_giv_expr doesn't recognise the GIV;  
here is why (thanks to Jim Wilson for putting me on this track):

        (mem/s:DF (plus:SI (plus:SI (mult:SI (reg/v:SI 32)
                        (const_int 8))
                    (reg/v/u:SI 28))
                (const_int -8)))) -1 (nil)

This is of the form (plus (plus (mult BIV const) INVARIANT) const);  
in the case of lower-bound==0 arrays the expression is of the form:
(plus (mult BIV const) INVARIANT) which _is_ recognised by  
simplify_giv_expr.  Now, of course, a human programmer would see  
that: (plus (plus (mult BIV const) INVARIANT) const) equals (plus  
(mult BIV const) (plus INVARIANT const)) or, viewed from a different  
angle: (plus (mult BIV const) INVARIANT) which is of the form  
recognised (an invariant register plus a constant is again an  
invariant).

So far, so good.  The next question is:  Is it feasible to stuff  
this knowledge into simplify_giv_expr ?  Note that if you'll change  
the contents of the INVARIANT register by moving the addition (plus  
INVARIANT const) out of the loop, you'll have to check and change  
all other appearances of INVARIANT (it might even be best to perform  
this optimisation only in case INVARIANT is used exactly once, and  
on all paths through the loop).  Questionable enough for me to let  
this rest for a while and try a different path.

I'm doing all this stuff on my venerable, four year old, slow but  
durable NeXTStation Color ('powered' by a 25 Mhz 68040), using  
g77-0.5.17 and gcc-2.7.1.  It already occurred to me that not much  
advantage is taken of the extra addressing modes of the 680{2|3|4}0  
in config/m68k/m68k.h, although a preliminary test removing the  
limits on constant offsets in one addressing mode didn't bring much  
speed improvement.  This time I decided to investigate all  
addressing modes listed to see where restrictions on offsets could  
be removed.  The result is the patch below for m68k.h.  Furthermore,  
studying the output of -dL compilations, it occurred to me that  
rather a lot of GIVs that _were_ found were deemed 'not worth  
while'.  This, plus the fact that a lot of GIVs are _known_ not to  
be found (due to the limitations outlined above) made me decide to  
simply bypass the test on the 'worthwhileness' of GIVs in the  
routine strength_reduce in loop.c.  Aside from that, I also decided  
to remove the restriction in move_movables (also in loop.c) on  
whether it is beneficial to move certain insn's out of loops (second  
patch below).

The net result of all these changes is that the elapsed CPU time  
for my application (a limited area weather forecasting code - see  
URL below) goes from 91000 seconds to 76000 seconds, an improvement  
of more than 16 % _over the whole program_.

Now for a first shot on the explanation of this speedup.  A  
preliminary observation is that the 680x0, x>=2, with the  
restrictions on constant offsets in base-offset addressing removed,  
will support collapsing the above memory reference into one address:

	an@(offset,dm:l:4)

(assuming four byte reals, with the base of the array in address  
register 'an' and the integer index in 'dm').  The savings is larger  
than is apparent from this single example, because (and I deduced  
this tentatively from studying the generated assembler code) common  
sub-expression elimination also sweeps all local arrays together,  
i.e. all local array references will be addressed using a6 as 'an'  
(the frame pointer) and _one_ free register 'dm' (or even 'am').  
This means that, in principle, the loop can contain an indefinite  
number of local array references (as long as they are traversed with  
the same stride) and still need only _one_ free register for the  
indexing.  No wonder the 'benefit' heuristics calculation in loop.c  
is off here !  The same is unfortunately not true for arrays passed  
as dummy arguments to the subroutine (I haven't checked for common  
block based arrays - in principle, the same optimisation is possible  
there).

The net result, however, is rather impressive !

It might therefore be useful (in analogy to -funroll-all-loops) to  
introduce two new optimisation flags: -fmove-all-movables and  
-freduce-all-givs.

Cheers,

--
Toon Moene (toon@moene.indiv.nluug.nl)
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
Phone: +31 346 214290; Fax: +31 346 214286
URL: http://www.knmi.nl/hirlam

The two patches:

% /usr/gnu/bin/diff -rc2P gcc-2.7.1/config/m68k/m68k.h.orig  
gcc-2.7.1/config/m68k/m68k.h
*** gcc-2.7.1/config/m68k/m68k.h.orig   Sat Oct 21 00:10:26 1995
--- gcc-2.7.1/config/m68k/m68k.h        Wed Nov 22 14:20:27 1995
***************
*** 1151,1155 ****
         && LEGITIMATE_BASE_REG_P (XEXP (X, 0))                         \
         && GET_CODE (XEXP (X, 1)) == CONST_INT                         \
!        && ((unsigned) INTVAL (XEXP (X, 1)) + 0x8000) < 0x10000)    
            \
     || (GET_CODE (X) == PLUS && XEXP (X, 0) == pic_offset_table_rtx    \
         && flag_pic && GET_CODE (XEXP (X, 1)) == SYMBOL_REF)           \
--- 1151,1155 ----
         && LEGITIMATE_BASE_REG_P (XEXP (X, 0))                         \
         && GET_CODE (XEXP (X, 1)) == CONST_INT                         \
!        && (TARGET_68020 || (unsigned) INTVAL (XEXP (X, 1)) +  
0x8000) < 0x10000)       \
     || (GET_CODE (X) == PLUS && XEXP (X, 0) == pic_offset_table_rtx    \
         && flag_pic && GET_CODE (XEXP (X, 1)) == SYMBOL_REF)           \
***************
*** 1157,1169 ****
         && flag_pic && GET_CODE (XEXP (X, 1)) == LABEL_REF))           \

- #if 0
- /* This should replace the last two (non-pic) lines
-    except that Sun's assembler does not seem to handle such  
operands.  */
-        && (TARGET_68020 ? CONSTANT_ADDRESS_P (XEXP (X, 1))            \
-          : (GET_CODE (XEXP (X, 1)) == CONST_INT                       \
-             && ((unsigned) INTVAL (XEXP (X, 1)) + 0x8000) < 0x10000))))
- #endif
-
-
  #define GO_IF_NONINDEXED_ADDRESS(X, ADDR)  \
  { if (INDIRECTABLE_1_ADDRESS_P (X)) goto ADDR; }
--- 1157,1160 ----
***************
*** 1190,1197 ****
    if (GET_CODE (X) == PLUS)                                           \
      { if (GET_CODE (XEXP (X, 1)) == CONST_INT                         \
!         && (unsigned) INTVAL (XEXP (X, 1)) + 0x80 < 0x100)            \
        { rtx go_temp = XEXP (X, 0); GO_IF_INDEXING (go_temp, ADDR); }  \
        if (GET_CODE (XEXP (X, 0)) == CONST_INT                         \
!         && (unsigned) INTVAL (XEXP (X, 0)) + 0x80 < 0x100)            \
        { rtx go_temp = XEXP (X, 1); GO_IF_INDEXING (go_temp,  
ADDR); } } }

--- 1181,1188 ----
    if (GET_CODE (X) == PLUS)                                           \
      { if (GET_CODE (XEXP (X, 1)) == CONST_INT                         \
!         && (TARGET_68020 || (unsigned) INTVAL (XEXP (X, 1)) +  
0x80 < 0x100))  \
        { rtx go_temp = XEXP (X, 0); GO_IF_INDEXING (go_temp, ADDR); }  \
        if (GET_CODE (XEXP (X, 0)) == CONST_INT                         \
!         && (TARGET_68020 || (unsigned) INTVAL (XEXP (X, 0)) +  
0x80 < 0x100))  \
        { rtx go_temp = XEXP (X, 1); GO_IF_INDEXING (go_temp,  
ADDR); } } }


% /usr/gnu/bin/diff -rc2P gcc-2.7.1/loop.c.orig gcc-2.7.1/loop.c
*** gcc-2.7.1/loop.c.orig       Tue Oct  3 17:17:16 1995
--- gcc-2.7.1/loop.c    Wed Nov 22 14:19:49 1995
***************
*** 1629,1633 ****

          if (already_moved[regno]
!             || (threshold * savings * m->lifetime) >= insn_count
              || (m->forces && m->forces->done
                  && n_times_used[m->forces->regno] == 1))
--- 1629,1633 ----

          if (already_moved[regno]
!             || 1 /* (threshold * savings * m->lifetime) >=  
insn_count */
              || (m->forces && m->forces->done
                  && n_times_used[m->forces->regno] == 1))
***************
*** 3821,3826 ****
             exit.  */

!         if (v->lifetime * threshold * benefit < insn_count
!             && ! bl->reversed)
            {
              if (loop_dump_stream)
--- 3821,3826 ----
             exit.  */

!         if ( 0 /* v->lifetime * threshold * benefit < insn_count
!             && ! bl->reversed */ )
            {
              if (loop_dump_stream)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]