This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Unroller with branch and count patch

> I don't think it is a good idea to include this in mainline (for one
> reason, it does not apply any more -- simple loop analysis was rewritten
> recently and moved to loop-iv.c); tomorrow I am going to send the
> rewrite of the doloop optimization pass, thus making this completely
> useless.

You imply that the doloop optimization will be performed after 
the unrolling. But it would not be preferable to do it before the 
unrolling ?

Performing the unrolling after the doloop optimization will give slightly
better code, as the doloop optimization is performed also on the 
generated before the unrolled loop. So for this region you have the usual
doloop optimization improvements. The register pressure is decreased if 
count register is a special register (think of the case 
of a loop with the exit condition i < N where N is no longer needed across 

this region). Also a compare is discarded and the count register controls 
execution of the loop so you get better scheduling (think of the case 
i = i + 1; cmp cond = i < N; if-then-else cond; which in our case is 
i = i + 1; branch-and-count and can be executed in a single cycle).

Performing the doloop optimization before the unrolling gives you a 
design. Usually the unrolling invalidates much of the loop information.
If doloop optimization is performed first, the iv information is still 
correct and you could exploit this for other optimizations if wanted. The 
doloop optimization is independent from unrolling, you don't need to care 
about what loop information is invalidated by unrolling and ways to update
it. And it gives you more freedom of where to place the doloop 

Considering that with our patch the unrolling is able to work with 
branch and count, why do you think that performing doloop before unroling
is preferable ? 

> Considering 3.4, could you please send some performance numbers? I would
> be especially interested in seeing differences between
> -funroll-loops -fbranch-count-reg without the patch
> -funroll-loops -fno-branch-count-reg without the patch
> and
>-funroll-loops -fbranch-count-reg with the patch.
> on some benchmark.

For the option -funroll-loops -fbranch-count-reg, the patch gains more
then 4% improvement overall CFSPEC2000 (f77, c) on Power4 with 3 
benchmarks showing around 10% improvement (wupise. swim, art). 

For overall CINTSPEC2000 no significant changes.

Mircea Namolaru

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]