This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: Two other optimization questions.

To: law at cygnus dot com
Subject: Re: Two other optimization questions.
From: Toon Moene <toon at moene dot indiv dot nluug dot nl>
Date: Wed, 07 Apr 1999 22:01:25 +0200
CC: egcs at egcs dot cygnus dot com
Organization: Moene Computational Physics, Maartensdijk, The Netherlands
References: <29563.923475347@upchuck>

Jeffrey A Law wrote:

>   I wrote:

>   >    Loop unrolling, like function inlining, is an optimization that
>   >    favors running time over code size; so they more or less "belong
>   >    together".  Wouldn't it be intuitive to *both* enable them by
>   >    default when -O3 is specified ?
> Actually I want to see a few things happen in this area:
> 
>         1. Allow some loops to be unrolled at -O2/-O3.  Specifically those
>            which don't bloat the code too much and which we are confident
>            will win when unrolled.
> 
>            Basically I'm thinking about single block loops which can be
>            trivially unrolled.

That would certainly cover most of the "interesting" (for loop
unrolling) loops in Fortran code.

However, this is not the last word on loop unrolling.  Just to quote a
(bilateral) discussion I had with Jim Wilson in February last year:

<QUOTE MY_COMMENTS_NOW=IN_BRACKETS>

Jim:

...two months ago you asked...

> me:

[ about limiting loop unrolling to 4 times, quoting unroll.c: ]

>          /* Limit loop unrolling to 4, since this will make 7 copies of
>             the loop body.  */
>          if (unroll_number > 4)
>            unroll_number = 4;

>        I do not completely understand the comment - probably it  
>       means to say that unrolling 8 times will lead to 7 extra copies of  
>       the loop body (because mod(number-of-iterations, number-of-unrolls)  
>       is maximally 7).

Jim again:

We get 2N-1 copies of the loop when preconditioning is performed.  Given
a
loop like this:
        for (i < 100;  i++)
          <body>
we can unroll it only if we do preconditioning.  We then get code
something
like this (not checked for correctness):
        tmp = i % 4;
        if (tmp <= 1) {
          if (tmp == 1) goto three:
          else goto loop:
        } else {
          if (tmp == 2) goto two:
          else goto one:
        }
        three:
          <body>
        two:
          <body>
        one:
          <body>
        loop:
        for (i < 100; i += 4)
          {
            <body>
            <body>
            <body>
            <body>
          }
Note that this has 7 copies of the original loop body.

>         It is not clear to me why this isn't implemented  
>       as:

>       do i = 1, mod(number-of-iterations, number-of-unrolls)
>          <body>
>       enddo
>       do i = mod(number-of-iterations, number-of-unrolls) + 1,
>    x         number-of-iterations, number-of-unrolls
>          <unrolled-body>
>       enddo

>       especially since, when unrolling naively, the number-of-unrolls is  
>       2, 4 or 8, so that mod(number-of-iterations, number-of-unrolls)  
>       effectively is iand(number-of-iterations, number-of-unrolls - 1),  
>       which is a very cheap computation compared to mod(...).

[I was pointing out that one only needed N+1 loop bodies here, because
 the "preconditioning" could also be done by a loop. ]

Jim:

I am not sure what you are asking for here, because the loop unroller
already does this, as I mentioned above.  The only difference is that it
is emitting multiple copies of the loop body instead of adding a second
loop.  I did it this way because I though it would give better code,
though
I may not have thought through all of the implications of this.
This makes sense for small loop bodies, but is a disadvantage for
large loop bodies.  Using a second loop also works better for larger
number-of-unrolls, since it avoids needed log2(number-of-unrolls)
branches
at the beginning.  Hmm, maybe this should be done for certain cases.

[ Well, another thing I thought I pointed out was that the current
  code contains too many taken forward branches.  The Alpha hates them
  and so probably other architectures. ]

</QUOTE>

[ errno setting not useful for Fortran ]

>   >    Would it be possible to implement a test settable by a Front End
>   >    that no errno setting is called for by its language ?
> Hmmm.  Yes, we're trying to be ANSI/ISO compliant.  Not it's not just errno,
> but the matherr handling by ANSI/ISO which mandates this behavior.
> 
> This probably isn't appropriate for Fortran ;-)  I've got no objection to
> some way for the backend to influence this code.  I do wonder what the Fortran
> standards say about the behavior of sqrt (-1).

Craig already wrote you the answer to that.  The full expression is: 
"An undefined operation allows the compiler to do anything, up to and
including starting WW III, if the appropriate optional hardware is
installed".

Needless to say, that's a paraphrase of what's in the Standard.

-- 
Toon Moene (toon@moene.indiv.nluug.nl)
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
Phone: +31 346 214290; Fax: +31 346 214286
g77 Support: fortran@gnu.org; egcs: egcs-bugs@cygnus.com

Follow-Ups:
- Re: Two other optimization questions.
  - From: Jeffrey A Law

References:
- Re: Two other optimization questions.
  - From: Jeffrey A Law

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]