This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: New loop unroller broken?


Hello,

> >> 1. We have another patch for enabling the new unroller to handle loops
> >> previously optimized by doloop optimizations.
> >>
> >> I haven't still tried your patch, but from the code it seems that
> removes
> >> only the branch at the end of unrolled copies, while preserving the
> >> increment of the count register. For PowerPC the effect will generally
> be
> >> the undoing of the doloop optimization for the unrolled loop because the
> >> count register is a special register.
> 
> >the code produced of course is not optimal, but this is just a temporary
> >solution that should be safe enough for inclusion into 3.4 branch.
> 
> >IMHO the best way how to solve the problem is not to create some overly
> >clever hack into unroller, but just running the doloop optimization
> >after unrolling, which will do the thing.
> 
> The problem with your patch is that for PowerPC it undoes the doloop
> optimization: in the unrolled loop the register used by the branch-on-count
> will also appear in an increment instruction, but this imposes the
> following
> restrictions on this register:
> 
> reg = reg - 1        -> reg should be a GPR,
> 
> branch-on-count reg  -> reg should be a (the) Count Register.
> 
> The result is that a GPR will be allocated to reg, and the branch-on-count
> later expanded into a decrement and a compare instruction (as prior to the
> doloop optimization). The doloop optimization in general introduces a new
> induction variable; with your patch, this iv requires a GPR, so the only
> effect is to increase register pressure. It would probably be better to use
> -fno-branch-count-reg when using -funroll-loops, than using the patch.
> 
> I tend to agree with you that running the doloop optimization after
> unrolling is
> the best solution, but this is not for GCC 3.4. We would like to improve
> regressions for GCC 3.4 on PowerPC, by applying both doloop and unrolling;
> this
> is what our patch does. Do you think it is suitable?

provided that it is doable in a suitably simple way (I haven't checked
your patch yet).

> >> 2. We have worked (almost finished, but not part of the above mentioned
> >> patch) at two other things that can be easily done during the
> >> unrolling. BTW, the first one is done by the old unroller.
> >>
> >> The first one regards basic induction variables. After the unrolling we
> >> will have:
> >>
> >> i = i + 1 (copy1)
> >> ....
> >> i = i + 1 (copy 2)
> >> ....
> >> i = i + 1 (copy 3)
> >>
> >> This can be rewritten as:
> >>
> >> j = i + 1
> >> ...
> >> k = i + 2
> >> ...
> >> l  = i + 3
> >>
> >> This will give opportunities for the scheduling as now there are no data
> >> dependencies between these instructions.
> 
> > -fweb achieves this (that's why I did not worry about it much); but of
> course
> > doing it also in the unroller does not spoil anything.
> 
> No. -fweb will allocate different names for different life ranges:
> 
>    j = i + 1
>    ...
>    k = j + 1
>    ...
>    l = k + 1
> 
> This will help the register allocation but not the scheduler as the
> dependencies
> between instructions are preserved, in contrast to the above rewriting.

but cse then rewrites this to the same code.

Zdenek


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]