This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Patch: fix SPEC regressions


On Tuesday, October 29, 2002, at 10:38  AM, Richard Henderson wrote:

On Tue, Oct 29, 2002 at 09:40:32AM -0800, Dale Johannesen wrote:
Sure, I left it this way to make reading it easier.  Not that's
it's easy anyway.
Err, but why?  Why are we computing it this way?
Ah, that's a different question. I deliberately didn't explain, in the
hope that somebody would derive it independently, thus giving us more
assurance we've finally gotten this right. However:

We have a loop running from 'initial' to 'final' with increment 'abs_inc',
and we're going to make 'unroll_number' copies of it. Typically the
number of iterations of the original loop will not be exactly divisible
by unroll_number, and in that case we'll need to add an extra iteration
of the unrolled loop which is only partially executed. As I said already,
preconditioning is expected to (and does) increment 'initial' appropriately
if any preiterations were done, so we don't have to worry about that.
Also, we assume that a runtime check for zero iterations was generated elsewhere
if needed, so we don't have to do anything special for that.

Note that 'final' is the expected value of the index after the loop; for the
common C form of loop for(i=0; i<final; i+=abs_inc) the loop body is not to be
executed with i=='final', the max value is final-abs_inc. This form is used
in examples.

The increment for the unrolled loop will obviously be abs_inc*unroll_number,
which we define as t1. For a given value of 'initial', it should be evident
that there are t1 different values of 'final' for which the number of
fully executed iterations are the same. For example, suppose abs_inc=3,
initial=3, unroll_number=4. Any value of 'final' between 13 and 24 inclusive
requires 1 full iteration. To round up the edge cases correctly we need to adjust
final-initial by abs_inc-1; thus,
(final-initial+abs_inc-1)/t1
or ((13..24-3) + 2)/(3*4) in the example.

Of the t1 different values of 'final' for a given number of fully executed
iterations, there are from 0..unroll_number-1 possible extra iterations (of
the original loop) necessary (call this t2); we only care whether t2 is 0 or not.
Evidently abs_inc different values of 'final' correspond to each t2 (in particular
to t2==0), and t2 is computed straightforwardly as
((final-initial+abs_inc-1)%t1)/abs_inc,
reducing the range first to 0..t1-1 then to 0..unroll_loops-1. We only care
that t2!=0, which happens when
((final-initial+abs_inc-1)%t1) >= abs_inc
or for 15..24 in the example. That is, the bottom abs_inc values of the t1 'final'
values do not require a second iteration, the rest do.

As you point out this simplifies to
(abs(final-initial) + abs_inc*unroll_number - 1) / t1


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]