This is the mail archive of the
mailing list for the GCC project.
Re: New loop unroller broken?
- From: Zdenek Dvorak <rakdver at atrey dot karlin dot mff dot cuni dot cz>
- To: Mircea Namolaru <NAMOLARU at il dot ibm dot com>
- Cc: Dale Johannesen <dalej at apple dot com>,David Edelsohn <dje at watson dot ibm dot com>, gcc at gcc dot gnu dot org,Andrew Pinski <pinskia at physics dot uc dot edu>,Ulrich Weigand <weigand at i1 dot informatik dot uni-erlangen dot de>
- Date: Sun, 25 Jan 2004 17:59:01 +0100
- Subject: Re: New loop unroller broken?
- References: <20040123194606.GA29952@atrey.karlin.mff.cuni.cz> <OF32DCC5DA.690DD889-ON42256E26.005BE97F-42256E26.005C6677@il.ibm.com>
> >> 1. We have another patch for enabling the new unroller to handle loops
> >> previously optimized by doloop optimizations.
> >> I haven't still tried your patch, but from the code it seems that
> >> only the branch at the end of unrolled copies, while preserving the
> >> increment of the count register. For PowerPC the effect will generally
> >> the undoing of the doloop optimization for the unrolled loop because the
> >> count register is a special register.
> >the code produced of course is not optimal, but this is just a temporary
> >solution that should be safe enough for inclusion into 3.4 branch.
> >IMHO the best way how to solve the problem is not to create some overly
> >clever hack into unroller, but just running the doloop optimization
> >after unrolling, which will do the thing.
> The problem with your patch is that for PowerPC it undoes the doloop
> optimization: in the unrolled loop the register used by the branch-on-count
> will also appear in an increment instruction, but this imposes the
> restrictions on this register:
> reg = reg - 1 -> reg should be a GPR,
> branch-on-count reg -> reg should be a (the) Count Register.
> The result is that a GPR will be allocated to reg, and the branch-on-count
> later expanded into a decrement and a compare instruction (as prior to the
> doloop optimization). The doloop optimization in general introduces a new
> induction variable; with your patch, this iv requires a GPR, so the only
> effect is to increase register pressure. It would probably be better to use
> -fno-branch-count-reg when using -funroll-loops, than using the patch.
> I tend to agree with you that running the doloop optimization after
> unrolling is
> the best solution, but this is not for GCC 3.4. We would like to improve
> regressions for GCC 3.4 on PowerPC, by applying both doloop and unrolling;
> is what our patch does. Do you think it is suitable?
provided that it is doable in a suitably simple way (I haven't checked
your patch yet).
> >> 2. We have worked (almost finished, but not part of the above mentioned
> >> patch) at two other things that can be easily done during the
> >> unrolling. BTW, the first one is done by the old unroller.
> >> The first one regards basic induction variables. After the unrolling we
> >> will have:
> >> i = i + 1 (copy1)
> >> ....
> >> i = i + 1 (copy 2)
> >> ....
> >> i = i + 1 (copy 3)
> >> This can be rewritten as:
> >> j = i + 1
> >> ...
> >> k = i + 2
> >> ...
> >> l = i + 3
> >> This will give opportunities for the scheduling as now there are no data
> >> dependencies between these instructions.
> > -fweb achieves this (that's why I did not worry about it much); but of
> > doing it also in the unroller does not spoil anything.
> No. -fweb will allocate different names for different life ranges:
> j = i + 1
> k = j + 1
> l = k + 1
> This will help the register allocation but not the scheduler as the
> between instructions are preserved, in contrast to the above rewriting.
but cse then rewrites this to the same code.