This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: New loop unroller broken?
- From: Zdenek Dvorak <rakdver at atrey dot karlin dot mff dot cuni dot cz>
- To: David Edelsohn <dje at watson dot ibm dot com>
- Cc: Dale Johannesen <dalej at apple dot com>,Andrew Pinski <pinskia at physics dot uc dot edu>,Ulrich Weigand <weigand at i1 dot informatik dot uni-erlangen dot de>,Mircea Namolau <namolaru at il dot ibm dot com>, gcc at gcc dot gnu dot org
- Date: Fri, 23 Jan 2004 20:46:06 +0100
- Subject: Re: New loop unroller broken?
- References: <20040122225123.GA7014@atrey.karlin.mff.cuni.cz> <200401231914.i0NJExT26132@makai.watson.ibm.com>
Hello,
> Mircea Namolaru asked me to forward the appended reply.
>
> David
>
> ------- Forwarded Message
>
> 1. We have another patch for enabling the new unroller to handle loops
> previously optimized by doloop optimizations.
>
> I haven't still tried your patch, but from the code it seems that removes
> only the branch at the end of unrolled copies, while preserving the
> increment of the count register. For PowerPC the effect will generally be
> the undoing of the doloop optimization for the unrolled loop because the
> count register is a special register.
the code produced of course is not optimal, but this is just a temporary
solution that should be safe enough for inclusion into 3.4 branch.
IMHO the best way how to solve the problem is not to create some overly
clever hack into unroller, but just running the doloop optimization
after unrolling, which will do the thing.
> If some conditions are met (no other uses of the count register in the
> loop beside its increment and the count register not live on exit from the
> loop), its increment can also be discarded from the unrolled copies. This
> requires the adjustment of its initialization and some changes in the
> generation of copies before the unrolled loop is entered. Our patch does
> this.
>
> We are evaluating the performance impact of this patch on PowerPC. Before
> submitting it the code needs to be brought to a more suitable form (adding
> comments, removal of some duplicated code, enabling the case when the
> branches can be discarded but not the increments). I've attached our
> changes below. Comments welcomed.
>
> 2. We have worked (almost finished, but not part of the above mentioned
> patch) at two other things that can be easily done during the
> unrolling. BTW, the first one is done by the old unroller.
>
> The first one regards basic induction variables. After the unrolling we
> will have:
>
> i = i + 1 (copy1)
> ....
> i = i + 1 (copy 2)
> ....
> i = i + 1 (copy 3)
>
> This can be rewritten as:
>
> j = i + 1
> ...
> k = i + 2
> ...
> l = i + 3
>
> This will give opportunities for the scheduling as now there are no data
> dependencies between these instructions.
-fweb achieves this (that's why I did not worry about it much); but of course
doing it also in the unroller does not spoil anything.
Zdenek