This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Unroller with branch and count patch
- From: Mircea Namolaru <NAMOLARU at il dot ibm dot com>
- To: Zdenek Dvorak <rakdver at atrey dot karlin dot mff dot cuni dot cz>
- Cc: Dale Johannesen <dalej at apple dot com>, David Edelsohn <dje at makai dot watson dot ibm dot com>, gcc-patches at gcc dot gnu dot org, Andrew Pinski <pinskia at physics dot uc dot edu>, Ulrich Weigand <weigand at i1 dot informatik dot uni-erlangen dot de>
- Date: Thu, 19 Feb 2004 14:33:53 +0200
- Subject: Re: Unroller with branch and count patch
> I don't think it is a good idea to include this in mainline (for one
> reason, it does not apply any more -- simple loop analysis was rewritten
> recently and moved to loop-iv.c); tomorrow I am going to send the
> rewrite of the doloop optimization pass, thus making this completely
> useless.
You imply that the doloop optimization will be performed after
the unrolling. But it would not be preferable to do it before the
unrolling ?
Performing the unrolling after the doloop optimization will give slightly
better code, as the doloop optimization is performed also on the
iterations
generated before the unrolled loop. So for this region you have the usual
doloop optimization improvements. The register pressure is decreased if
the
count register is a special register (think of the case
of a loop with the exit condition i < N where N is no longer needed across
this region). Also a compare is discarded and the count register controls
the
execution of the loop so you get better scheduling (think of the case
i = i + 1; cmp cond = i < N; if-then-else cond; which in our case is
i = i + 1; branch-and-count and can be executed in a single cycle).
Performing the doloop optimization before the unrolling gives you a
cleaner
design. Usually the unrolling invalidates much of the loop information.
If doloop optimization is performed first, the iv information is still
correct and you could exploit this for other optimizations if wanted. The
doloop optimization is independent from unrolling, you don't need to care
about what loop information is invalidated by unrolling and ways to update
it. And it gives you more freedom of where to place the doloop
optimzation.
Considering that with our patch the unrolling is able to work with
branch and count, why do you think that performing doloop before unroling
is preferable ?
> Considering 3.4, could you please send some performance numbers? I would
> be especially interested in seeing differences between
>
> -funroll-loops -fbranch-count-reg without the patch
> -funroll-loops -fno-branch-count-reg without the patch
>
> and
>
>-funroll-loops -fbranch-count-reg with the patch.
>
> on some benchmark.
For the option -funroll-loops -fbranch-count-reg, the patch gains more
then 4% improvement overall CFSPEC2000 (f77, c) on Power4 with 3
benchmarks showing around 10% improvement (wupise. swim, art).
For overall CINTSPEC2000 no significant changes.
Mircea Namolaru