[PATCH,vect] ppc cost model options
Jagasia, Harsha
harsha.jagasia@amd.com
Mon Nov 26 02:24:00 GMT 2007
>Sorry for not being able to respond earlier to this thread.
The attached patch bootstrapped and passed test on amd64-linux.
Ok for mainline?
Thanks,
>>> >vectorized and such) because the vectorizer cost-model is pending
some
>>> >fixes from Harsha
>>> (http://gcc.gnu.org/ml/gcc-patches/2007-09/msg00916.html
>>> >), to be followed by target-specific tuning of costs.)
>>>
>>> Here is the patch implementing the early cost model check.
>>>
>>
>>great!
>>
>>> The check is added as follows:
>>> - If we do versioning, the threshold comparison is and'ed to the
guard
>>> that controls the versioning,
>>> - Otherwise, if we do peeling for alignment, we can determine the
>>> loop-count of the prolog loop according to the threshold test
>>> - Otherwise, the threshold comparison is or'ed with the guard that
>>> controls whether to bypass the vector code and go straight to the
scalar
>>> epilogue.
>>>
>>
>>The only technical issue I have is with the fact that I think we're
not
>>taking into account that because of the presence of the runtime-test,
>there
>>are costs that should be added also to the scalar side of the equasion
>>(branch costs), and these costs are different in each of the above 3
>cases.
>>(This is something we don't need to worry about when we make the
decision
>>at compile time). For example, in case 2, even if we choose to
execute
>the
>>scalar loop we pay the cost of 2 branches that did not exist in the
>>original code; in case 3 we pay the cost of 1-3 branches (depending on
>>whether we also peel for alignment and if so, if we also enter the
prolog
>>loop). (We also pay for these branches on the vector side of the
equasion,
>>but this is already taken into account). So we actually have 4
slighlty
>>different equasions:
>>
>>equasion for compile-time test:
>> scalar_costs >= vector_costs
>>
>>equasion for run-time test, case 1:
>> scalar_costs
>> + versioning_guard_cost >= vector_costs
>>
>>equasion for run-time test, case 2:
>> scalar_costs
>> + cost_of_guard_for_prolog_peel >= vector_costs
>>
>>equasion for run-time test, case 3:
>> scalar_costs
>> + cost_of_guard_for_prolog_peel
>> + cost_of_guard_for_epilog_peel >= vector_costs
>
>Just to clarify
>
>In case 1, the check is:
> if (cost > th & ...)
> jmp to vector code
>Hence the run-time scalar cost should be incremented by a NOT-TAKEN
branch.
>
>In case 2, the check is:
> if (cost <= th)
> prologue=scalar_iters
> if (prologue == 0)
> jmp to vector code
> else
> execute prologue
> if (prologue == num_iters)
> go to exit
>
>Hence the run-time scalar cost should be incremented by a taken branch,
>which on architectures like x86-64 should be a conditional move, plus a
>NOT-TAKEN branch, plus a TAKEN BRANCH.
>
>In case 3, the check is:
> if (prologue == 0)
> jmp to vector code
> else
> execute prologue
> if prologue=num_iters
> go to exit
> vector code:
> if (cost <=th | (scalar_iters-prologue-epilogue == 0))
> jmp to epilogue
>
>Hence the run-time scalar cost should be incremented by a (taken branch
or
>2 not-taken branches for prologue) plus taken branch.
>
>>Do you think we can add these 3 different fixups somehow, depending on
>>where the run-time test is actually placed? (this can be important for
>>platforms like the SPU, where branches can be very costly, such that
we
>may
>>choose to vectorize the less efficient vector version and not pay the
>>branch cost involved with branching to skip the vector code).
>
>Indeed, I will add the costs.
>
>>
>>The rest are just various style issues:
>>
>>> * tree-vect-transform.c (vect_do_peeling_for_loop_bound): Computes
>>> the treshold if the cost model check has not been done already.
>>
>>treshold -> threshold
>>
>>> Index: tree-vectorizer.c
>>> ===================================================================
>>> --- tree-vectorizer.c (revision 129356)
>>> +++ tree-vectorizer.c (working copy)
>>...
>>> slpeel_tree_peel_loop_to_edge (struct loop *loop,
>>> edge e, tree first_niters,
>>> tree niters, bool update_first_loop_count,
>>> - unsigned int th)
>>> + unsigned int th, bool check)
>>
>>Please document the new function argument "check" (I see that a
>>documentation for "th" is also missing; if you feel like it, please
add
>>that too while you're at it...)
>>
>>Actually, maybe we can come up with something more descriptive than
>>"check"? e.g. "check_profitability", "add_cost_model_check"...
>
>Ok.
>
>>
>>> - /* 2. Add the guard that controls whether the first loop is
executed.
>>> - Resulting CFG would be:
>>> + /* 2.a Add the guard that controls whether the first loop is
>executed.
>>> + Resulting CFG would be:
>>>
>>> - bb_before_first_loop:
>>> - if (FIRST_NITERS == 0) GOTO bb_before_second_loop
>>> - GOTO first-loop
>>> + bb_before_first_loop:
>>> + if (FIRST_NITERS == 0) GOTO bb_before_second_loop
>>> + GOTO first-loop
>>>
>>> - first_loop:
>>> - do {
>>> - } while ...
>>> + first_loop:
>>> + do {
>>> + } while ...
>>>
>>> - bb_before_second_loop:
>>> + bb_before_second_loop:
>>>
>>> - second_loop:
>>> - do {
>>> - } while ...
>>> + second_loop:
>>> + do {
>>> + } while ...
>>>
>>> - orig_exit_bb:
>>> - */
>>> + orig_exit_bb:
>>> + */
>>> +
>>> +
>>> + /* 2.b Add the cost model check that allows the prologue
>>> + to iterate for the entire unchanged scalar
>>> + iterations of the loop in the event that the cost
>>> + model indicates that the scalar loop is more
>>> + profitable than the vector one.
>>> +
>>> + Resulting CFG after prologue peeling would be:
>>> +
>>> + if (scalar_loop_iterations <= th)
>>> + FIRST_NITERS = scalar_loop_iterations
>>> +
>>> + bb_before_first_loop:
>>> + if (FIRST_NITERS == 0) GOTO bb_before_second_loop
>>> + GOTO first-loop
>>> +
>>> + first_loop:
>>> + do {
>>> + } while ...
>>> +
>>> + bb_before_second_loop:
>>> +
>>> + second_loop:
>>> + do {
>>> + } while ...
>>> +
>>> + orig_exit_bb:
>>> + */
>>> +
>>> + /* 2.c Add the cost model check that allows the epilogue
>>> + to iterate for the entire unchanged scalar
>>> + iterations of the loop in the event that the cost
>>> + model indicates that the scalar loop is more
>>> + profitable than the vector one.
>>> +
>>> + Resulting CFG after prologue peeling would be:
>>> +
>>> + bb_before_first_loop:
>>> + if ((scalar_loop_iterations <= th)
>>> + ||
>>> + FIRST_NITERS == 0) GOTO bb_before_second_loop
>>> + GOTO first-loop
>>> +
>>> + first_loop:
>>> + do {
>>> + } while ...
>>> +
>>> + bb_before_second_loop:
>>> +
>>> + second_loop:
>>> + do {
>>> + } while ...
>>> +
>>> + orig_exit_bb:
>>> + */
>>>
>>
>>Just so it's clear that this function does only one of the above per
>>invocation, maybe write: "2. Add the control/guard code in one of the
>>following ways: 2.a. ..."
>
>Ok.
>
>>Also, I was wondering if it would be clearer if the description was
more
>>compact - i.e. instead of duplicating 3 times the pseudo-code that
remains
>>the same in all 3 alternatives, do something along these lines:
>>"
>>2. Add the control/guard code. Resulting CFG would be:
>>
>> bb_before_first_loop:
>> if (COND) GOTO bb_before_second_loop
>> GOTO first-loop
>>
>> first_loop:
>> do {
>> } while ...
>>
>> bb_before_second_loop:
>>
>> second_loop:
>> do {
>> } while ...
>>
>> orig_exit_bb:
>>
>>The following 3 variants are supported:
>>2a. The guard controls whether the first loop is executed.
>> COND: FIRST_NITERS == 0
>>2b. Add the cost model check that allows the prologue to iterate....
>> if (scalar_loop_iterations <= th)
>> FIRST_NITERS = scalar_loop_iterations
>> COND: FIRST_NITERS == 0
>>2c. Add the cost model check that allows the epilogue to iterate...
>> COND: (scalar_loop_iterations <= th) || FIRST_NITERS == 0
>>"
>>
>>... but in the end I'm not sure which way is clearer, so I'll leave
the
>>decision to you.
>>
>
>Ok, I will try to make it clearer. But its hard to say which is clearer
>because the same function is used to do different things for prologue
and
>epilogue.
>
>>A general comment about the changes to slpeel_tree_peel_loop_to_edge:
I
>>wonder if this function would be taken out of the vectorizer one day
and
>>merged with the generic peeling utils. In any case, it would be
desirable
>>if instead of manipulating the condition inside the function, the
function
>>would get the condition as an input? (even if it ends up remaining a
>>vectorizer specific function - I think this will be a good thing to
do).
>If
>>this looks like too much work, then please consider factoring out the
new
>>code that builds the condition in 3 different ways, or at least just
the
>>part that creates the if-stmt for the "Peologue peeling case:
>
>Ok.
>
>>
>>> + /* Epilogue peeling. */
>>> + if (!update_first_loop_count)
>>> + {
>>...
>>> + }
>>> + /* Prologue peeling. */
>>> + else
>>> + {
>>> + if (check)
>>> + first_niters = conditionally_reset_first_niters (...)
>>> +
>>...
>>> + }
>>
>>
>>> Index: tree-vect-transform.c
>>> ===================================================================
>>> --- tree-vect-transform.c (revision 129356)
>>> +++ tree-vect-transform.c (working copy)
>>...
>>> @@ -6482,28 +6483,35 @@ vect_do_peeling_for_loop_bound (loop_vec
>>>
>>> loop_num = loop->num;
>>>
>>> - /* Analyze cost to set threshhold for vectorized loop. */
>>
>>threshhold --> threshold
>>
>>...
>
>
>Ok.
>
>>> + /* Analyze cost to set threshhold for vectorized loop. */
>>> + min_profitable_iters = LOOP_VINFO_COST_MODEL_MIN_ITERS
>>(loop_vinfo);
>>>
>>
>>about the comment - we don't "Analyze cost" here, just use the already
>>precalculcated threshold. (it also repeats in other places).
>>
>>> - th = (unsigned) min_scalar_loop_bound;
>>> - if (min_profitable_iters
>>> - && (!min_scalar_loop_bound
>>> - || min_profitable_iters > min_scalar_loop_bound))
>>> - th = (unsigned) min_profitable_iters;
>>> + min_scalar_loop_bound = ((PARAM_VALUE
(PARAM_MIN_VECT_LOOP_BOUND)
>>> + * LOOP_VINFO_VECT_FACTOR (loop_vinfo)) - 1);
>>>
>>> - if (((LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo) < 0)
>>> - || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
>>> - && vect_print_dump_info (REPORT_DETAILS))
>>> - fprintf (vect_dump, "vectorization may not be profitable.");
>>> + /* Use the cost model only if it is more conservative than
>>> user specified
>>> + threshold. */
>>> + th = (unsigned) min_scalar_loop_bound;
>>> + if (min_profitable_iters
>>> + && (!min_scalar_loop_bound
>>> + || min_profitable_iters > min_scalar_loop_bound))
>>> + th = (unsigned) min_profitable_iters;
>>> +
>>> + if (th && vect_print_dump_info (REPORT_DETAILS))
>>> + fprintf (vect_dump, "loop bound peeling: vectorization may not be
>>> profitable.");
>>> + }
>>
>>The above code repeats almost identically 3 times: here, in
>>vect_do_peeling_for_alignment and in vect_loop_versioning. Please it
>factor
>>out...
>>
>Ok.
>
>>> @@ -7031,7 +7078,11 @@ vect_create_cond_for_alias_checks (loop_
>>> data references that may or may not be aligned. An additional
>>> sequence of runtime tests is generated for each pairs of DDRs
whose
>>> independence was not proven. The vectorized version of loop is
>>> - executed only if both alias and alignment tests are passed. */
>>> + executed only if both alias and alignment tests are passed.
>>> +
>>> + The test generated to check which version of loop is executed
>>> + is modified to check for profitability as indicated by the
>>
>>--> "is modified to *also* check for profitability", right?
>>
>
>Yes.
>
>Thanks,
>Harsha
>
>>thanks,
>>dorit
>>
>>
>>> The patch bootstrapped and passed test on amd64-linux.
>>>
>>> Ok for trunk?
>>>
>>> >
>>> >> Thanks, David
>>> >>
>>> >> * gcc.dg/vect/costmodel/ppc/ppc-costmodel-vect.exp: Use
>>> >> -mcpu=970 instead of 7400.
>>> >>
>>> >> Index: ppc-costmodel-vect.exp
>>> >>
===================================================================
>>> >> --- ppc-costmodel-vect.exp (revision 129469)
>>> >> +++ ppc-costmodel-vect.exp (working copy)
>>> >> @@ -49,7 +49,7 @@
>>> >> } else {
>>> >> if [is-effective-target ilp32] {
>>> >> # Specify a cpu that supports VMX for compile-only
tests.
>>> >> - lappend DEFAULT_VECTCFLAGS "-mcpu=7400"
>>> >> + lappend DEFAULT_VECTCFLAGS "-mcpu=970"
>>> >> }
>>> >> set dg-do-what-default compile
>>> >> }
>>> >
>>> >
>>>
>>> [attachment "0008-costmodel-early-scalar.diff" deleted by Dorit
>>> Nuzman/Haifa/IBM]
>>
>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0009-costmodel-early-scalar.diff
Type: application/octet-stream
Size: 30092 bytes
Desc: 0009-costmodel-early-scalar.diff
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20071126/94101697/attachment.obj>
More information about the Gcc-patches
mailing list