This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Unroller with branch and count patch


> > > I don't think it is a good idea to include this in mainline (for one
> > > reason, it does not apply any more -- simple loop analysis was 
> > > recently and moved to loop-iv.c); tomorrow I am going to send the
> > > rewrite of the doloop optimization pass, thus making this completely
> > useless.
> > 
> > You imply that the doloop optimization will be performed after 
> > the unrolling. But it would not be preferable to do it before the 
> >unrolling ?
> >
> > Performing the unrolling after the doloop optimization will give 
> > better code, as the doloop optimization is performed also on the 
> > iterations
> > generated before the unrolled loop. So for this region you have the 
> > doloop optimization improvements. The register pressure is decreased 
> > the
> > count register is a special register (think of the case 
> >of a loop with the exit condition i < N where N is no longer needed 

> this won't work.  You still must keep be able to determine the number of 

> iterations for doloop optimization, so you won't spare anything,
> especially since

No. With our patch, the initialization of the count register is done 
before the peeled copies. Afterward  the number of iterations is no longer 

needed (the branch and count is used at the end of the peeled copies). But 

indeed, without our patch, the number of iterations will be needed at the 
of the peeled copies. So with our patch there is no need for an 
additional register to keep the number of iterations across the peeled 
copies (assuming that the count register is a special register).

>> this region). Also a compare is discarded and the count register 
>> the 
>> execution of the loop so you get better scheduling (think of the case 
>> i = i + 1; cmp cond = i < N; if-then-else cond; which in our case is 
>> i = i + 1; branch-and-count and can be executed in a single cycle).

>... the exit checks are eliminated from the peeled copies; so you
> do not gain anything by this, either.

I am speaking about the exit check that still remains at the end of 

>> Performing the doloop optimization before the unrolling gives you a 
>> cleaner 
>> design.

> I do not think so.  As your own patch proves, you need to clutter the
> unrolling code by a lot of strange (and basically unrelated) junk.

Indeed it complicates a the unroller code in order to handle
branch and count. On the good side, the unroller doesn't need to update
some loop information for the sake of the (following) doloop optimization 
alone. This creates a dependency between the doloop code and the unroller.
If the doloop optimization is changed and some more loop information is 
needed you have to go to the unroller and to update this information 
after the unrolling. 

This was the intend of cleaner design: no dependency between the loop 
unroller code and the doloop optimization code.

>> Usually the unrolling invalidates much of the loop information.

> No it does not -- we still know everything we have known before (in some
> cases even more, since by peeling some of the iterations we already know
> that the number of iterations is not "negative".

After the unrolling the induction variables are no longer induction
variables (as they have multiple definitions). Maybe you have all the
information but you have a loop not suitable for loop optimization. 
And if someone comes with some optimization that needs branch and count 
and induction variables ? 

> > > Considering 3.4, could you please send some performance numbers? I 
> > > be especially interested in seeing differences between
> > > 
> > > -funroll-loops -fbranch-count-reg without the patch
> > > -funroll-loops -fno-branch-count-reg without the patch
> > > 
> > > and
> > >
> > >-funroll-loops -fbranch-count-reg with the patch.
> > >
> > > on some benchmark.
> >
> > For the option -funroll-loops -fbranch-count-reg, the patch gains more
> > then 4% improvement overall CFSPEC2000 (f77, c) on Power4 with 3 
> > benchmarks showing around 10% improvement (wupise. swim, art). 

> and against -funroll-loops -fno-branch-count-reg?  It is quite possible
> the gains are mostly due to unrolling (that is prevented by the doloop
> optimization), and that the gains obtained by the doloop optimization
> are mostly negligible, so I would be nice to have numbers to either
> prove or disprove this.

Our patch make possible the unrolling of branch and count loops, so the
unrolling gains are now possible also for such loops. This is what we 
have measured. For the moment, we don't have any other results available.

Mircea Namolaru

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]