[PATCH 3/4] ivopts: Consider cost_step on different forms during unrolling
Sun Aug 16 03:59:20 GMT 2020
On Mon, Aug 10, 2020 at 10:41 PM Kewen.Lin <email@example.com> wrote:
> Hi Bin,
> on 2020/8/10 下午8:38, Bin.Cheng wrote:
> > On Mon, Aug 10, 2020 at 12:27 PM Kewen.Lin <firstname.lastname@example.org> wrote:
> >> Hi Bin,
> >> Thanks for the review!!
> >> on 2020/8/8 下午4:01, Bin.Cheng wrote:
> >>> Hi Kewen,
> >>> Sorry for the late reply.
> >>> The patch's most important change is below cost computation:
> >>>> @@ -5890,6 +5973,10 @@ determine_iv_cost (struct ivopts_data *data, struct iv_cand *cand)
> >>>> cost_step = add_cost (data->speed, TYPE_MODE (TREE_TYPE (base)));
> >>>> cost = cost_step + adjust_setup_cost (data, cost_base.cost);
> >>>> + /* Consider additional step updates during unrolling. */
> >>>> + if (data->consider_reg_offset_for_unroll_p && !cand->reg_offset_p)
> >>>> + cost += (data->current_loop->estimated_unroll - 1) * cost_step;
> >>> This is a bit strange, to me the add instructions are additional
> >>> computation caused by unrolling+addressing_mode, rather than a native
> >>> part in candidate itself. Specifically, an additional cost is needed
> >>> if candidates (without reg_offset_p) are chosen for the address type
> >>> group/uses.
> >> Good point, ideally it should be one additional cost for each cand set,
> >> when we select one cand for one group, we need to check this pair need
> >> more (estimated_unroll - 1) step costs, we probably need to care about
> >> this during remove/replace etc. IIUC the current IVOPTs cost framework
> >> doesn't support this and it could increase the selection complexity and
> >> time. I hesitated to do it and put it to cand step cost initially instead.
> >> I was thinking those candidates with reg_offset_p should be only used for
> >> those reg_offset_p groups in most cases (very limited) meanwhile the others
> >> are simply scaled up like before. But indeed this can cover some similar
> >> cases like one cand is only used for the compare type group which is for
> >> loop closing, then it doesn't need more step costs for unrolling.
> >> Do you prefer me to improve the current cost framework?
> > No, I don't think it's relevant to the candidate selecting algorithm.
> > I was thinking about adjusting cost somehow in
> > determine_group_iv_cost_address. Given we don't expose selected
> > addressing mode in this function, you may need to do it in
> > get_address_cost, either way.
> Thanks for your suggestion!
> Sorry, I may miss something, but I still think the additional cost is
> per candidate. The justification is that we miss to model the iv
> candidate step well in the context of unrolling, the step cost is part
> of candidate cost, which is per candidate.
> To initialize it in determine_iv_cost isn't perfect as you pointed out,
> ideally we should check any uses of the candidate requires iv update
> after each replicated iteration, and take extra step costs into account
> if at least one needs, meanwhile scaling up all the computation cost to
> reflect unrolling cost nature.
I see, it's similar to the auto-increment case where cost should be
recorded only once. So this is okay given 1) fine predicting
rtl-unroll is likely impossible here; 2) the patch has very limited
> Besides, the reg_offset desirable pair already takes zero cost for
> cand/group cost, IIRC negative cost isn't preferred in IVOPTs, are you
> suggesting increasing the cost for non reg_offset pairs? If so and per
> pair, the extra cost looks possible to be computed several times
> >>>> +
> >>>> /* Prefer the original ivs unless we may gain something by replacing it.
> >>>> The reason is to make debugging simpler; so this is not relevant for
> >>>> artificial ivs created by other optimization passes. */
> >>>> @@ -3654,6 +3729,14 @@ set_group_iv_cost (struct ivopts_data *data,
> >>>> return;
> >>>> }
> >>>> + /* Since we priced more on non reg_offset IV cand step cost, we should scale
> >>>> + up the appropriate IV group costs. Simply consider USE_COMPARE at the
> >>>> + loop exit, FIXME if multiple exits supported or no loop exit comparisons
> >>>> + matter. */
> >>>> + if (data->consider_reg_offset_for_unroll_p
> >>>> + && group->vuses->type != USE_COMPARE)
> >>>> + cost *= (HOST_WIDE_INT) data->current_loop->estimated_unroll;
> >>> Not quite follow here, giving "pricing more on on-reg_offset IV cand"
> >>> doesn't make much sense to me. Also why generic type uses are not
> >>> skipped? We want to model the cost required for address computation,
> >>> however, for generic type uses there is no way to save the computation
> >>> in "address expression". Once unrolled, the computation is always
> >>> there?
> >> The main intention is to scale up the group/cand cost for unrolling since
> >> we have scaled up the step costs. The assumption is that the original
> > If we adjust cost appropriately in function *group_iv_cost_address,
> > this would become unnecessary, right? And naturally.
> >> costing (without this patch) can be viewed as either for all unrolled
> >> iterations or just one single iteration. Since IVOPTs doesn't support
> >> fractional costing, I interpreted it as single iterations, to emulate
> >> unrolling scenario based on the previous step cost scaling, we need to
> >> scale up the cost for all computation.
> >> In most cases, the compare type use is for loop closing, there is still
> >> only one computation even unrolling happens, so I excluded it here.
> >> As "FIXME", if we find some cases are off, we can further restrict it to
> >> those USE_COMPARE uses which is exactly for loop closing.
> >>> And what's the impact on targets supporting [base + index + offset]
> >>> addressing mode?
> >> Good question, I didn't notice it since power doesn't support it.
> >> I noticed the comments of function addr_offset_valid_p only mentioning
> >> [base + offset], I guess it excludes [base + index + offset]?
> >> But I guess the address-based IV can work for this mode?
> > No, addr_offset_valid_p is only used to split address use groups. See
> > get_address_cost and struct mem_address.
> > I forgot to ask, what about target only supports [base + offset]
> > addressing mode like RISC-V? I would expect it's not affected at all.
> addr_offset_valid_p is also used in this patch as Richard S. and Segher
> suggested to check the offset after unrolling (like: offset+(uf-1)*step)
> is still valid for the target. If address-based IV can not work for
> [base + index + offset], it won't affect [base + index + offset].
> It can help all targets which supports [base + offset], so I'm afraid
> it can affect RISC-V too, but I would expect it's positive. Or do you
> happen to see some potential issues? or have some concerns?
> And as Richard S. suggested before, it has one parameter to control,
> target can simply disable this if it dislikes.
> >>> Given the patch is not likely to harm because rtl loop unrolling is
> >>> (or was?) by default disabled, so I am OK once the above comments are
> >>> addressed.
> >> Yes, it needs explicit unrolling options, excepting for some targets
> >> wants to enable it for some cases with specific loop_unroll_adjust checks.
> >>> I wonder if it's possible to get 10% of (all which should be unrolled)
> >>> loops unrolled (conservatively) on gimple and enable it by default at
> >>> O3, rather than teaching ivopts to model a future pass which not
> >>> likely be used outside of benchmarks?
> >> Yeah, it would be nice if the unrolling happen before IVOPTs and won't
> >> have any future unrollings to get it off. PR seems to have some
> >> discussion on gimple unrolling.
> > Thanks for directing me to the discussion. I am on Wilco's side on
> > this problem, IMHO, It might be useful getting small loops unrolled
> > (at O3 by default) by simply saving induction variable stepping and
> > exit condition check, which can be modeled on gimple. Especially for
> > RISC-V, it doesn't support index addressing mode, which means there
> > will be as many induction variables as distinct arrays. Also
> > interleaving after unrolling is not that important, it's at high
> > chance that small loops eligible for interleaving are handled by
> > vectorizer already.
> Good idea, CC Jeff since I think Jeff (Jiufu) has been working to see
> whether we can bring in one gimple unrolling pass. Regardless we
> have/don't have it, if the RTL unrolling is there, I guess this patch
> set is still beneficial? If so, I would take it separately.
More information about the Gcc-patches