[PATCH v2 2/3] Add predict_doloop_p target hook
Jeff Law
law@redhat.com
Mon May 20 19:31:00 GMT 2019
On 5/15/19 10:44 AM, Segher Boessenkool wrote:
> On Wed, May 15, 2019 at 10:53:43AM +0200, Richard Biener wrote:
>> I wonder if making the doloop patterns (tried to find them in rs6000.md,
>> but I only see define_expands with no predicates/alternatives...)
>
> "doloop_end" --> "ctr<mode>" --> "<bd>_<mode>"
> (all consecutive in rs6000.md btw.) Alternative 0 in "<bd>_<mode>"
> are the actual looping instructions; the other alternatives are for
> the uncommon case where we ended up not being able to use this insn
> after all.
>
>> accept any counter register, just have a preference on that special
>> counter reg and have the define_insn deal with RA allocating another
>> one by emitting a regular update & branch-on-zero?
>
> That is what those other alternatives do. It is expensive, and cannot
> even *work* in all cases.
>
> We have no generic "branch on (not) zero" in Power, btw. Archs that do
> can use that as a doloop, if they choose IVs that end the loop at 0.
>
>> That is, the penalty of doing that shouldn't be too big and thus
>> we can more optimistically cost & handle "doloops"?
>
> We have done that for ages, in the RTL level doloop handling. With
> newer hardware it becomes more and more expensive to guess wrong.
>
>> I guess
>> the doloop.c checks are quite too strict because we have to
>> rely on RA being able to allocate that reg and as soon as we
>> need to spill it using a general reg with update & branch-on-zero
>> will be cheaper anyways?
>
> (Update, compare, branch, for us).
>
> We can predict quite well where the count register will be unavailable
> to our doloops. The cost if we are allocated a GPR isn't so bad: it
> costs an insn or maybe two more than if we made optimal code (without
> doloop).
>
> But we can be allocated a floating point register, or memory, instead.
> That is heavily discouraged (by making it more expensive), but it can
> still happen. This is a jump_insn so it cannot get any reloads, either;
> but even if it could, that is an *expensive* thing to do.
RIght. ANd that's consistent with what other architectures have needed
to do. I can't describe the pain of what happens on the PA when you
find out that the loop counter got allocated to the shift amount
register or a floating point register. It's rare, but you had to handle
it. Ugh.
jeff
More information about the Gcc-patches
mailing list