[PATCH v2 2/3] Add predict_doloop_p target hook

Jeff Law law@redhat.com
Mon May 20 19:31:00 GMT 2019


On 5/15/19 10:44 AM, Segher Boessenkool wrote:
> On Wed, May 15, 2019 at 10:53:43AM +0200, Richard Biener wrote:
>> I wonder if making the doloop patterns (tried to find them in rs6000.md,
>> but I only see define_expands with no predicates/alternatives...)
> 
> "doloop_end" --> "ctr<mode>" --> "<bd>_<mode>"
> (all consecutive in rs6000.md btw.)  Alternative 0 in "<bd>_<mode>"
> are the actual looping instructions; the other alternatives are for
> the uncommon case where we ended up not being able to use this insn
> after all.
> 
>> accept any counter register, just have a preference on that special
>> counter reg and have the define_insn deal with RA allocating another
>> one by emitting a regular update & branch-on-zero?
> 
> That is what those other alternatives do.  It is expensive, and cannot
> even *work* in all cases.
> 
> We have no generic "branch on (not) zero" in Power, btw.  Archs that do
> can use that as a doloop, if they choose IVs that end the loop at 0.
> 
>> That is, the penalty of doing that shouldn't be too big and thus
>> we can more optimistically cost & handle "doloops"?
> 
> We have done that for ages, in the RTL level doloop handling.  With
> newer hardware it becomes more and more expensive to guess wrong.
> 
>> I guess
>> the doloop.c checks are quite too strict because we have to
>> rely on RA being able to allocate that reg and as soon as we
>> need to spill it using a general reg with update & branch-on-zero
>> will be cheaper anyways?
> 
> (Update, compare, branch, for us).
> 
> We can predict quite well where the count register will be unavailable
> to our doloops.  The cost if we are allocated a GPR isn't so bad: it
> costs an insn or maybe two more than if we made optimal code (without
> doloop).
> 
> But we can be allocated a floating point register, or memory, instead.
> That is heavily discouraged (by making it more expensive), but it can
> still happen.  This is a jump_insn so it cannot get any reloads, either;
> but even if it could, that is an *expensive* thing to do.
RIght.  ANd that's consistent with what other architectures have needed
to do.  I can't describe the pain of what happens on the PA when you
find out that the loop counter got allocated to the shift amount
register or a floating point register.  It's rare, but you had to handle
it.  Ugh.

jeff



More information about the Gcc-patches mailing list