This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Unroller with branch and count patch
- From: Mircea Namolaru <NAMOLARU at il dot ibm dot com>
- To: Zdenek Dvorak <rakdver at atrey dot karlin dot mff dot cuni dot cz>
- Cc: Dale Johannesen <dalej at apple dot com>, David Edelsohn <dje at makai dot watson dot ibm dot com>, gcc-patches at gcc dot gnu dot org, Andrew Pinski <pinskia at physics dot uc dot edu>, Ulrich Weigand <weigand at i1 dot informatik dot uni-erlangen dot de>
- Date: Thu, 19 Feb 2004 18:46:17 +0200
- Subject: Re: Unroller with branch and count patch
Hello,
> > > I don't think it is a good idea to include this in mainline (for one
> > > reason, it does not apply any more -- simple loop analysis was
rewritten
> > > recently and moved to loop-iv.c); tomorrow I am going to send the
> > > rewrite of the doloop optimization pass, thus making this completely
> > useless.
> >
> > You imply that the doloop optimization will be performed after
> > the unrolling. But it would not be preferable to do it before the
> >unrolling ?
> >
> > Performing the unrolling after the doloop optimization will give
slightly
> > better code, as the doloop optimization is performed also on the
> > iterations
> > generated before the unrolled loop. So for this region you have the
usual
> > doloop optimization improvements. The register pressure is decreased
if
> > the
> > count register is a special register (think of the case
> >of a loop with the exit condition i < N where N is no longer needed
across
> this won't work. You still must keep be able to determine the number of
> iterations for doloop optimization, so you won't spare anything,
> especially since
No. With our patch, the initialization of the count register is done
before the peeled copies. Afterward the number of iterations is no longer
needed (the branch and count is used at the end of the peeled copies). But
indeed, without our patch, the number of iterations will be needed at the
end
of the peeled copies. So with our patch there is no need for an
additional register to keep the number of iterations across the peeled
copies (assuming that the count register is a special register).
>> this region). Also a compare is discarded and the count register
controls
>> the
>> execution of the loop so you get better scheduling (think of the case
>> i = i + 1; cmp cond = i < N; if-then-else cond; which in our case is
>> i = i + 1; branch-and-count and can be executed in a single cycle).
>... the exit checks are eliminated from the peeled copies; so you
> do not gain anything by this, either.
I am speaking about the exit check that still remains at the end of
peeling
copies.
>> Performing the doloop optimization before the unrolling gives you a
>> cleaner
>> design.
> I do not think so. As your own patch proves, you need to clutter the
> unrolling code by a lot of strange (and basically unrelated) junk.
Indeed it complicates a the unroller code in order to handle
branch and count. On the good side, the unroller doesn't need to update
some loop information for the sake of the (following) doloop optimization
alone. This creates a dependency between the doloop code and the unroller.
If the doloop optimization is changed and some more loop information is
needed you have to go to the unroller and to update this information
after the unrolling.
This was the intend of cleaner design: no dependency between the loop
unroller code and the doloop optimization code.
>> Usually the unrolling invalidates much of the loop information.
> No it does not -- we still know everything we have known before (in some
> cases even more, since by peeling some of the iterations we already know
> that the number of iterations is not "negative".
After the unrolling the induction variables are no longer induction
variables (as they have multiple definitions). Maybe you have all the
information but you have a loop not suitable for loop optimization.
And if someone comes with some optimization that needs branch and count
and induction variables ?
> > > Considering 3.4, could you please send some performance numbers? I
would
> > > be especially interested in seeing differences between
> > >
> > > -funroll-loops -fbranch-count-reg without the patch
> > > -funroll-loops -fno-branch-count-reg without the patch
> > >
> > > and
> > >
> > >-funroll-loops -fbranch-count-reg with the patch.
> > >
> > > on some benchmark.
> >
> > For the option -funroll-loops -fbranch-count-reg, the patch gains more
> > then 4% improvement overall CFSPEC2000 (f77, c) on Power4 with 3
> > benchmarks showing around 10% improvement (wupise. swim, art).
> and against -funroll-loops -fno-branch-count-reg? It is quite possible
> the gains are mostly due to unrolling (that is prevented by the doloop
> optimization), and that the gains obtained by the doloop optimization
> are mostly negligible, so I would be nice to have numbers to either
> prove or disprove this.
Our patch make possible the unrolling of branch and count loops, so the
unrolling gains are now possible also for such loops. This is what we
have measured. For the moment, we don't have any other results available.
Mircea Namolaru