This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [testcase] simplify_binary_operation test (was Re: -freduce-all-givs and fortran)
Richard Henderson wrote:
> On Fri, Sep 07, 2001 at 08:55:25PM +0200, Toon Moene wrote:
> > I can foresee what you want to conclude from that, but g77 will keep
> > -freduce-all-givs. This loop simply is too small to show the
> > difference.
> Well, more data for loops that do better with -freduce-all-givs
> would be helpful.
Hmmm, I thought I already explained why limiting the number of (general)
induction variables to be reduced is not optimal w.r.t. the average
Fortran code (by virtue of the way such code uses multi-rank array
offsets in loops).
Perhaps it's enlightning to explain that this "discussion" is already
six years old, as far as I am concerned.
Maybe it's instructive to attach the e-mail that got me into this (sent
to gcc2@cygnus.com d.d. 22nd of November 1995).
--
Toon Moene - mailto:toon@moene.indiv.nluug.nl - phoneto: +31 346 214290
Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
Maintainer, GNU Fortran 77: http://gcc.gnu.org/onlinedocs/g77_news.html
Join GNU Fortran 95: http://g95.sourceforge.net/ (under construction)
>From - Fri Sep 7 23:18:43 2001
Return-Path: <sun4nl!cygnus.com!gcc2-request>
Received: by moene.indiv.nluug.nl (NX5.67e/NeXT-2.0)
id AA06974; Wed, 22 Nov 95 18:46:00 +0100
Received: from cygnus.com by sun4nl.NL.net with SMTP
id AA21339 (5.65b/CWI-3.3); Wed, 22 Nov 1995 17:27:47 +0100
Received: from sun4nl.NL.net (sun4nl.NL.net [193.78.240.1]) by cygnus.com (8.6.12/8.6.9) with SMTP id GAA28440 for <gcc2@cygnus.com>; Wed, 22 Nov 1995 06:34:48 -0800
Received: from moene by sun4nl.NL.net via EUnet
id AA09110 (5.65b/CWI-3.3); Wed, 22 Nov 1995 15:34:36 +0100
Received: by moene.indiv.nluug.nl (NX5.67e/NeXT-2.0)
id AA05237; Wed, 22 Nov 95 15:33:25 +0100
Message-Id: <9511221433.AA05237@moene.indiv.nluug.nl>
Content-Type: text/plain
Mime-Version: 1.0 (NeXT Mail 3.3 v118.2)
Received: by NeXT.Mailer (1.118.2)
From: Toon Moene <toon@moene.indiv.nluug.nl>
Date: Wed, 22 Nov 95 15:33:21 +0100
To: gcc2@cygnus.com
Subject: Loop optimisation: GIV strength reduction and Fortran
Hi All,
Last week I presented a piece of rather run-of-the-mill Fortran
loop code with arrays that gets relatively poorly optimised because
Fortran array lower bounds are by default 1 and not 0. In the
lower-bound==0 case, the address expressions for array elements are
recognised (by simplify_giv_expr in loop.c) as being general
induction variables. The strength reduction performed on these
registers results in the generation of post-increment addressing -
hence the indexing in these arrays is essentially free. When the
lower bound is not 0, simplify_giv_expr doesn't recognise the GIV;
here is why (thanks to Jim Wilson for putting me on this track):
(mem/s:DF (plus:SI (plus:SI (mult:SI (reg/v:SI 32)
(const_int 8))
(reg/v/u:SI 28))
(const_int -8)))) -1 (nil)
This is of the form (plus (plus (mult BIV const) INVARIANT) const);
in the case of lower-bound==0 arrays the expression is of the form:
(plus (mult BIV const) INVARIANT) which _is_ recognised by
simplify_giv_expr. Now, of course, a human programmer would see
that: (plus (plus (mult BIV const) INVARIANT) const) equals (plus
(mult BIV const) (plus INVARIANT const)) or, viewed from a different
angle: (plus (mult BIV const) INVARIANT) which is of the form
recognised (an invariant register plus a constant is again an
invariant).
So far, so good. The next question is: Is it feasible to stuff
this knowledge into simplify_giv_expr ? Note that if you'll change
the contents of the INVARIANT register by moving the addition (plus
INVARIANT const) out of the loop, you'll have to check and change
all other appearances of INVARIANT (it might even be best to perform
this optimisation only in case INVARIANT is used exactly once, and
on all paths through the loop). Questionable enough for me to let
this rest for a while and try a different path.
I'm doing all this stuff on my venerable, four year old, slow but
durable NeXTStation Color ('powered' by a 25 Mhz 68040), using
g77-0.5.17 and gcc-2.7.1. It already occurred to me that not much
advantage is taken of the extra addressing modes of the 680{2|3|4}0
in config/m68k/m68k.h, although a preliminary test removing the
limits on constant offsets in one addressing mode didn't bring much
speed improvement. This time I decided to investigate all
addressing modes listed to see where restrictions on offsets could
be removed. The result is the patch below for m68k.h. Furthermore,
studying the output of -dL compilations, it occurred to me that
rather a lot of GIVs that _were_ found were deemed 'not worth
while'. This, plus the fact that a lot of GIVs are _known_ not to
be found (due to the limitations outlined above) made me decide to
simply bypass the test on the 'worthwhileness' of GIVs in the
routine strength_reduce in loop.c. Aside from that, I also decided
to remove the restriction in move_movables (also in loop.c) on
whether it is beneficial to move certain insn's out of loops (second
patch below).
The net result of all these changes is that the elapsed CPU time
for my application (a limited area weather forecasting code - see
URL below) goes from 91000 seconds to 76000 seconds, an improvement
of more than 16 % _over the whole program_.
Now for a first shot on the explanation of this speedup. A
preliminary observation is that the 680x0, x>=2, with the
restrictions on constant offsets in base-offset addressing removed,
will support collapsing the above memory reference into one address:
an@(offset,dm:l:4)
(assuming four byte reals, with the base of the array in address
register 'an' and the integer index in 'dm'). The savings is larger
than is apparent from this single example, because (and I deduced
this tentatively from studying the generated assembler code) common
sub-expression elimination also sweeps all local arrays together,
i.e. all local array references will be addressed using a6 as 'an'
(the frame pointer) and _one_ free register 'dm' (or even 'am').
This means that, in principle, the loop can contain an indefinite
number of local array references (as long as they are traversed with
the same stride) and still need only _one_ free register for the
indexing. No wonder the 'benefit' heuristics calculation in loop.c
is off here ! The same is unfortunately not true for arrays passed
as dummy arguments to the subroutine (I haven't checked for common
block based arrays - in principle, the same optimisation is possible
there).
The net result, however, is rather impressive !
It might therefore be useful (in analogy to -funroll-all-loops) to
introduce two new optimisation flags: -fmove-all-movables and
-freduce-all-givs.
Cheers,
--
Toon Moene (toon@moene.indiv.nluug.nl)
Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
Phone: +31 346 214290; Fax: +31 346 214286
URL: http://www.knmi.nl/hirlam
The two patches:
% /usr/gnu/bin/diff -rc2P gcc-2.7.1/config/m68k/m68k.h.orig
gcc-2.7.1/config/m68k/m68k.h
*** gcc-2.7.1/config/m68k/m68k.h.orig Sat Oct 21 00:10:26 1995
--- gcc-2.7.1/config/m68k/m68k.h Wed Nov 22 14:20:27 1995
***************
*** 1151,1155 ****
&& LEGITIMATE_BASE_REG_P (XEXP (X, 0)) \
&& GET_CODE (XEXP (X, 1)) == CONST_INT \
! && ((unsigned) INTVAL (XEXP (X, 1)) + 0x8000) < 0x10000)
\
|| (GET_CODE (X) == PLUS && XEXP (X, 0) == pic_offset_table_rtx \
&& flag_pic && GET_CODE (XEXP (X, 1)) == SYMBOL_REF) \
--- 1151,1155 ----
&& LEGITIMATE_BASE_REG_P (XEXP (X, 0)) \
&& GET_CODE (XEXP (X, 1)) == CONST_INT \
! && (TARGET_68020 || (unsigned) INTVAL (XEXP (X, 1)) +
0x8000) < 0x10000) \
|| (GET_CODE (X) == PLUS && XEXP (X, 0) == pic_offset_table_rtx \
&& flag_pic && GET_CODE (XEXP (X, 1)) == SYMBOL_REF) \
***************
*** 1157,1169 ****
&& flag_pic && GET_CODE (XEXP (X, 1)) == LABEL_REF)) \
- #if 0
- /* This should replace the last two (non-pic) lines
- except that Sun's assembler does not seem to handle such
operands. */
- && (TARGET_68020 ? CONSTANT_ADDRESS_P (XEXP (X, 1)) \
- : (GET_CODE (XEXP (X, 1)) == CONST_INT \
- && ((unsigned) INTVAL (XEXP (X, 1)) + 0x8000) < 0x10000))))
- #endif
-
-
#define GO_IF_NONINDEXED_ADDRESS(X, ADDR) \
{ if (INDIRECTABLE_1_ADDRESS_P (X)) goto ADDR; }
--- 1157,1160 ----
***************
*** 1190,1197 ****
if (GET_CODE (X) == PLUS) \
{ if (GET_CODE (XEXP (X, 1)) == CONST_INT \
! && (unsigned) INTVAL (XEXP (X, 1)) + 0x80 < 0x100) \
{ rtx go_temp = XEXP (X, 0); GO_IF_INDEXING (go_temp, ADDR); } \
if (GET_CODE (XEXP (X, 0)) == CONST_INT \
! && (unsigned) INTVAL (XEXP (X, 0)) + 0x80 < 0x100) \
{ rtx go_temp = XEXP (X, 1); GO_IF_INDEXING (go_temp,
ADDR); } } }
--- 1181,1188 ----
if (GET_CODE (X) == PLUS) \
{ if (GET_CODE (XEXP (X, 1)) == CONST_INT \
! && (TARGET_68020 || (unsigned) INTVAL (XEXP (X, 1)) +
0x80 < 0x100)) \
{ rtx go_temp = XEXP (X, 0); GO_IF_INDEXING (go_temp, ADDR); } \
if (GET_CODE (XEXP (X, 0)) == CONST_INT \
! && (TARGET_68020 || (unsigned) INTVAL (XEXP (X, 0)) +
0x80 < 0x100)) \
{ rtx go_temp = XEXP (X, 1); GO_IF_INDEXING (go_temp,
ADDR); } } }
% /usr/gnu/bin/diff -rc2P gcc-2.7.1/loop.c.orig gcc-2.7.1/loop.c
*** gcc-2.7.1/loop.c.orig Tue Oct 3 17:17:16 1995
--- gcc-2.7.1/loop.c Wed Nov 22 14:19:49 1995
***************
*** 1629,1633 ****
if (already_moved[regno]
! || (threshold * savings * m->lifetime) >= insn_count
|| (m->forces && m->forces->done
&& n_times_used[m->forces->regno] == 1))
--- 1629,1633 ----
if (already_moved[regno]
! || 1 /* (threshold * savings * m->lifetime) >=
insn_count */
|| (m->forces && m->forces->done
&& n_times_used[m->forces->regno] == 1))
***************
*** 3821,3826 ****
exit. */
! if (v->lifetime * threshold * benefit < insn_count
! && ! bl->reversed)
{
if (loop_dump_stream)
--- 3821,3826 ----
exit. */
! if ( 0 /* v->lifetime * threshold * benefit < insn_count
! && ! bl->reversed */ )
{
if (loop_dump_stream)