This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: New loop unroller broken?


> 	Mircea Namolaru asked me to forward the appended reply.
> David
> ------- Forwarded Message
> 1. We have another patch for enabling the new unroller to handle loops
> previously optimized by doloop optimizations.
> I haven't still tried your patch, but from the code it seems that removes
> only the branch at the end of unrolled copies, while preserving the
> increment of the count register. For PowerPC the effect will generally be
> the undoing of the doloop optimization for the unrolled loop because the
> count register is a special register.

the code produced of course is not optimal, but this is just a temporary
solution that should be safe enough for inclusion into 3.4 branch.

IMHO the best way how to solve the problem is not to create some overly
clever hack into unroller, but just running the doloop optimization
after unrolling, which will do the thing.

> If some conditions are met (no other uses of the count register in the
> loop beside its increment and the count register not live on exit from the
> loop), its increment can also be discarded from the unrolled copies. This
> requires the adjustment of its initialization and some changes in the
> generation of copies before the unrolled loop is entered. Our patch does
> this.
> We are evaluating the performance impact of this patch on PowerPC. Before
> submitting it the code needs to be brought to a more suitable form (adding
> comments, removal of some duplicated code, enabling the case when the
> branches can be discarded but not the increments). I've attached our
> changes below. Comments welcomed.
> 2. We have worked (almost finished, but not part of the above mentioned
> patch) at two other things that can be easily done during the
> unrolling. BTW, the first one is done by the old unroller.
> The first one regards basic induction variables. After the unrolling we
> will have:
> i = i + 1 (copy1)
> ....
> i = i + 1 (copy 2)
> ....
> i = i + 1 (copy 3)
> This can be rewritten as:
> j = i + 1
> ...
> k = i + 2
> ...
> l  = i + 3
> This will give opportunities for the scheduling as now there are no data
> dependencies between these instructions.

-fweb achieves this (that's why I did not worry about it much); but of course
doing it also in the unroller does not spoil anything.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]