This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Unroller with branch and count patch


> > I don't think it is a good idea to include this in mainline (for one
> > reason, it does not apply any more -- simple loop analysis was rewritten
> > recently and moved to loop-iv.c); tomorrow I am going to send the
> > rewrite of the doloop optimization pass, thus making this completely
> > useless.
> You imply that the doloop optimization will be performed after 
> the unrolling. But it would not be preferable to do it before the 
> unrolling ?
> Performing the unrolling after the doloop optimization will give slightly
> better code, as the doloop optimization is performed also on the 
> iterations
> generated before the unrolled loop. So for this region you have the usual
> doloop optimization improvements. The register pressure is decreased if 
> the
> count register is a special register (think of the case 
> of a loop with the exit condition i < N where N is no longer needed across 

this won't work.  You still must keep be able to determine the number of 
iterations for doloop optimization, so you won't spare anything,
especially since

> this region). Also a compare is discarded and the count register controls 
> the 
> execution of the loop so you get better scheduling (think of the case 
> i = i + 1; cmp cond = i < N; if-then-else cond; which in our case is 
> i = i + 1; branch-and-count and can be executed in a single cycle).

... the exit checks are eliminated from the peeled copies; so you
do not gain anything by this, either.

> Performing the doloop optimization before the unrolling gives you a 
> cleaner 
> design.

I do not think so.  As your own patch proves, you need to clutter the
unrolling code by a lot of strange (and basically unrelated) junk.

> Usually the unrolling invalidates much of the loop information.

No it does not -- we still know everything we have known before (in some
cases even more, since by peeling some of the iterations we already know
that the number of iterations is not "negative".

> If doloop optimization is performed first, the iv information is still 
> correct and you could exploit this for other optimizations if wanted. The 
> doloop optimization is independent from unrolling, you don't need to care 
> about what loop information is invalidated by unrolling and ways to update
> it. And it gives you more freedom of where to place the doloop 
> optimzation.
> Considering that with our patch the unrolling is able to work with 
> branch and count, why do you think that performing doloop before unroling
> is preferable ? 

It is easier, the code to handle it is cleaner and I do not see a reason
why not to.

> > Considering 3.4, could you please send some performance numbers? I would
> > be especially interested in seeing differences between
> > 
> > -funroll-loops -fbranch-count-reg without the patch
> > -funroll-loops -fno-branch-count-reg without the patch
> > 
> > and
> >
> >-funroll-loops -fbranch-count-reg with the patch.
> >
> > on some benchmark.
> For the option -funroll-loops -fbranch-count-reg, the patch gains more
> then 4% improvement overall CFSPEC2000 (f77, c) on Power4 with 3 
> benchmarks showing around 10% improvement (wupise. swim, art). 

and against -funroll-loops -fno-branch-count-reg?  It is quite possible
the gains are mostly due to unrolling (that is prevented by the doloop
optimization), and that the gains obtained by the doloop optimization
are mostly negligible, so I would be nice to have numbers to either
prove or disprove this.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]