target cost model tuning for x86
Dorit Nuzman
DORIT@il.ibm.com
Sun Sep 9 20:14:00 GMT 2007
> >> - The cost model equation is changed to
> >> min_profitable_iters = (voc * vf
> >> - vic * prologue_iters
> >> - vic * epilogue_iters)
> >> /(sic * vf - vic)
> >>
> >> instead of
> >> min_iters = (voc * vf)/(sic * vf - vic)
> >>
> >
> >can you please explain why you are ignoring the costs of the prolog and
> >epilog loops? actually, I think I may have found the answer in the
> >following paragraph?:
>
> I think there are 2 ways to look at the equation. Vectorization will be
> profitable when:
>
> -------------
> SC * scalar_loop_iters >=
> (VIC * (scalar_loop_iters - epilogue_iters - prologue_iters)/VF
> + SC * prologue_iters + prologue_guard_costs
> + SC * epilogue_iters + epilogue_guard_costs
> + versioning_costs)
>
> The scalar loop iterates scalar_loop_iters times and the vector loop
> iterates (scalar_loop_iters - epilogue_iters - prologue_iters)/VF times
> In this case th is in terms of scalar_loop_iters.
>
> This is the equation I adjusted and used in the patch.
>
yes, this is right. the fix in your patch is indeed required.
>
> >
> > > - The cost model checks at compile time if the loop iterations are
> less
> >> than or equal to whats estimated by the cost model. It also checks at
> >> run time if the iterations left after peeling for prologue and
> epilogue
> >> are less than or equal to whats estimated by the cost model. If a
> loop
> >> is determined as profitable at compile-time, it may be evaluated as
> >> unprofitable at run-time.
> >> For example, num_iters=7 and vf=4 and cost model estimates
> >> profitable_iters=6 and data is aligned. At compile time since
> (num_iters
> >> > min_profitable_iters), this loop is profitable to vectorize.
> However,
> >> the run time check compares the iterations left after peeling for vf
> i.e
> >> 4 versus the min_profitable_iters and finds it to be unprofitable.
> >
> >This is correct, but
> >
> >> This patch sets the runtime threshold as 0 for cases which can be
> >> evaluated for profitability at compile time.
> >>
> >
> >I don't like this solution so much. Actually it seems to me that by
> >ignoring the costs of the prolog/epilog loops, you already fixed the
> >inaccuracy in the run-time check, so you don't need this extra bit. The
> >problem is that at the same time you made the compile-time test less
> >accurate than it was before. Please correct me if I'm wrong: AFAIU,
> there
> >are 3 components to the overall iteration count: prolog_iters,
> >main_loop_iters and epilog_iters. The threshold that the cost model
> >computed before your patch consisted of all three:
> > TH_before = prolog_iters + main_loop_iters + epilog_iters
> >This was ok for the compile-time test (which compares the above to
> >LOOP_VINFO_INT_NITERS which also represents the overall iteration
> count),
> >but was not ok for the run-time test (which compared it only to
> >main_loop_iters, and not the overall count).
> >With your fix to ignore the prolog and epilog loops we now have:
> > TH_after = main_loop_iters
> >This is ok for the run-time test (so I don't see why you need the extra
> fix
> >you mention above), however it is not ok for the compile time test. So
> I
> >think what you need to do now is fix the compile-time test instead of
> the
> >run-time test. i.e., instead of comparing th with
> LOOP_VINFO_INT_NITERS,
> >you need to compare th with:
> > LOOP_VINFO_INT_NITERS - prolog_iters - LOOP_VINFO_INT_NITERS%VF
> >where prolog_iters =
> > LOOP_PEELING_FOR_ALIGNMENT >= 0 ? LOOP_PEELING_FOR_ALIGNMENT :
> VF/2;
>
> Sorry, I should have mentioned that the fix I had was temporary. I was
> planning to work on a follow on patch, which would fix the run time
> check to compare the scalar_iterations with the threshold instead of the
> vector iterations. But I had not realized that PARAM_MIN_VECT_LOOP_BOUND
> was the minimum vector iterations.
>
> (Assuming you think the equations are right,) if we decide to go with
> the vector_loop_iters equation then I can do what you suggest and fix
> the compile time test. Or if we decide to go with the
> scalar_loop_iterations equation, then I can go ahead with a run-time
> test change
yes I agree, and I don't have a strong preference either way
> and if so PARAM_MIN_VECT_LOOP_BOUND will have to be
> multiplied with the vectorization factor.
(I think it is already multiplied by VF wherever it is currently used in
the code)
> >I think what caused the confusion here is that
> PARAM_MIN_VECT_LOOP_BOUND is
> >defined as the minimum number of iterations of the vectorized loop. So
> if
> >we don't allow vectorization when niters is LESS_THAN_OR_EQUAL
> >min_vect_loop_bound, we are more conservative than what the user asked
> for.
> >So lets indeed change the test to LESS_THAN_OR_EQUAL, but then lets
> also
> >add 1 to PARAM_MIN_VECT_LOOP_BOUND (both in
> >tree-vect-analyze.c:vect_analyze_operations, and in
> >tree-vect-transform.c:vect_do_peeling_for_loop_bound).
>
> If we change the test to LESS_THAN_OR_EQUAL, shouldn't we subtract 1
> from PARAM_MIN_VECT_LOOP_BOUND instead of adding i.e to avoid being more
> conservative, check if niters <= (PARAM_MIN_VECT_LOOP_BOUND -1)?
> I can make the change in both the places you suggest.
>
yes, sure
thanks,
dorit
> Thanks,
> Harsha
>
>
More information about the Gcc-patches
mailing list