[PATCH 3/4 v3] ivopts: Consider cost_step on different forms during unrolling

Wed Sep 2 03:55:34 GMT 2020

On Wed, Sep 2, 2020 at 11:50 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> Hi Bin,
>
> >> 2) This case makes me think we should exclude ainc candidates in function
> >> mark_reg_offset_candidates.  The justification is that: ainc candidate
> >> handles step update itself and when we calculate the cost for it against
> >> its ainc_use, the cost_step has been reduced. When unrolling happens,
> >> the ainc computations are replicated and it doesn't save step updates
> >> like normal reg_offset_p candidates.
> > Though auto-inc candidate embeds stepping operation into memory
> > access, we might want to avoid it in case of unroll if there are many
> > sequences of memory accesses, and if the unroll factor is big.  The
> > rationale is embedded stepping is a u-arch operation and does have its
> > cost.
> >
>
> Thanks for the comments!  Agree!  Excluding them from reg_offset_p
> candidates here is consistent with this expectation, it makes us
> consider the unroll factor effect when checking the corresponding
> step cost and the embedded stepping cost (in group/candidate cost,
> minus step cost and use the cost from the address_cost hook).
>
> >>
> >> I've updated the patch to punt ainc_use candidates as below:
> >>
> >>> +         /* Skip AINC candidate since it contains address update itself,
> >>> +            the replicated AINC computations when unrolling still have
> >>> +            updates, unlike reg_offset_p candidates can save step updates
> >>> +            when unrolling.  */
> >>> +         if (cand->ainc_use)
> >>> +           continue;
> >>> +
> >>
> >> For this new attached patch, it's bootstrapped and regress-tested without
> >> explicit unrolling, while the only one failure has been identified as
> >> rs6000 specific address_cost issue in bootstrapping and regression testing
> >> with explicit unrolling.
> >>
> >> By the way, with above simple hack of address_cost, I also did one
> >> bootstrapping and regression testing with explicit unrolling, the above
> >> sms-4.c did pass as I expected but had two failures instead:
> >>
> >>   PASS->FAIL: gcc.dg/sms-compare-debug-1.c (test for excess errors)
> >>   PASS->FAIL: gcc.dg/tree-ssa/ivopts-lt.c scan-tree-dump-times ivopts "PHI" 2
> >>
> >> By further investigation, the 2nd one is expected due to the adddress_cost hook
> >> hack, while the 1st one one exposed -fcompare-debug issue in sms.  The RTL
> >> sequence starts to off from sms, just some NOTE_INSN_DELETED positions change.
> >> I believe it's just exposed by this combination unluckily/luckily ;-) I will
> >> send a patch separately for it once I got time to look into it, but it should
> >> be unrelated to this patch series for sure.
> > This is the kind of situation I intended to avoid before.  IMHO, this
> > isn't a neat change (it can't be given we are predicting the future
> > transformation in compilation pipeline), accumulation of such changes
> > could make IVOPTs break in one way or another.  So as long as you make
> > sure it doesn't have functional impact in case of no-rtl_unroll, I am
> > fine.
>
> Yeah, I admit it's not neat, but the proposals in the previous discussions
> without predicting unroll factor can not work well for all scenarios with
> different unroll factors, they could over-blame some kind of candidates.
> For the case of no-rtl_unroll, unroll factor estimation should set
> loop->estimated_unroll to zero, all these changes won't take effect. The
> estimation function follows the same logics as that of RTL unroll factor
> calculation, I did test with explicit unrolling disablement before, it
> worked expectedly.
Thanks for working on this, also sorry for being nitpicking.

Thanks,
bin