This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

FW: [PATCH,vect] ppc cost model options


Sorry for not being able to respond earlier to this thread.

>> >vectorized and such) because the vectorizer cost-model is pending
some
>> >fixes from Harsha
>> (http://gcc.gnu.org/ml/gcc-patches/2007-09/msg00916.html
>> >), to be followed by target-specific tuning of costs.)
>>
>> Here is the patch implementing the early cost model check.
>>
>
>great!
>
>> The check is added as follows:
>> - If we do versioning, the threshold comparison is and'ed to the
guard
>> that controls the versioning,
>> - Otherwise, if we do peeling for alignment, we can determine the
>> loop-count of the prolog loop according to the threshold test
>> - Otherwise, the threshold comparison is or'ed with the guard that
>> controls whether to bypass the vector code and go straight to the
scalar
>> epilogue.
>>
>
>The only technical issue I have is with the fact that I think we're not
>taking into account that because of the presence of the runtime-test,
there
>are costs that should be added also to the scalar side of the equasion
>(branch costs), and these costs are different in each of the above 3
cases.
>(This is something we don't need to worry about when we make the
decision
>at compile time).  For example, in case 2, even if we choose to execute
the
>scalar loop we pay the cost of 2 branches that did not exist in the
>original code; in case 3 we pay the cost of 1-3 branches (depending on
>whether we also peel for alignment and if so, if we also enter the
prolog
>loop). (We also pay for these branches on the vector side of the
equasion,
>but this is already taken into account). So we actually have 4 slighlty
>different equasions:
>
>equasion for compile-time test:
>      scalar_costs >= vector_costs
>
>equasion for run-time test, case 1:
>      scalar_costs
>      + versioning_guard_cost >= vector_costs
>
>equasion for run-time test, case 2:
>      scalar_costs
>      + cost_of_guard_for_prolog_peel >= vector_costs
>
>equasion for run-time test, case 3:
>      scalar_costs
>      + cost_of_guard_for_prolog_peel
>      + cost_of_guard_for_epilog_peel >= vector_costs

Just to clarify

In case 1, the check is:
	if (cost > th & ...)
	  jmp to vector code
Hence the run-time scalar cost should be incremented by a NOT-TAKEN
branch.

In case 2, the check is: 
  if (cost <= th)
    prologue=scalar_iters 
  if (prologue == 0)
    jmp to vector code
  else
    execute prologue
    if (prologue == num_iters)
	go to exit

Hence the run-time scalar cost should be incremented by a taken branch,
which on architectures like x86-64 should be a conditional move, plus a
NOT-TAKEN branch, plus a TAKEN BRANCH.

In case 3, the check is:
  if (prologue == 0)
    jmp to vector code
  else
    execute prologue
    if prologue=num_iters
	go to exit
  vector code:
  if (cost <=th | (scalar_iters-prologue-epilogue == 0))
    jmp to epilogue

Hence the run-time scalar cost should be incremented by a (taken branch
or 2 not-taken branches for prologue) plus taken branch.

>Do you think we can add these 3 different fixups somehow, depending on
>where the run-time test is actually placed? (this can be important for
>platforms like the SPU, where branches can be very costly, such that we
may
>choose to vectorize the less efficient vector version and not pay the
>branch cost involved with branching to skip the vector code).

Indeed, I will add the costs.

>
>The rest are just various style issues:
>
>>  * tree-vect-transform.c (vect_do_peeling_for_loop_bound): Computes
>>  the treshold if the cost model check has not been done already.
>
>treshold -> threshold
>
>> Index: tree-vectorizer.c
>> ===================================================================
>> --- tree-vectorizer.c (revision 129356)
>> +++ tree-vectorizer.c (working copy)
>...
>>  slpeel_tree_peel_loop_to_edge (struct loop *loop,
>>            edge e, tree first_niters,
>>            tree niters, bool update_first_loop_count,
>> -          unsigned int th)
>> +          unsigned int th, bool check)
>
>Please document the new function argument "check" (I see that a
>documentation for "th" is also missing; if you feel like it, please add
>that too while you're at it...)
>
>Actually, maybe we can come up with something more descriptive than
>"check"?  e.g. "check_profitability", "add_cost_model_check"...

Ok.

>
>> -  /* 2. Add the guard that controls whether the first loop is
executed.
>> -        Resulting CFG would be:
>> +  /* 2.a Add the guard that controls whether the first loop is
executed.
>> +         Resulting CFG would be:
>>
>> -        bb_before_first_loop:
>> -        if (FIRST_NITERS == 0) GOTO bb_before_second_loop
>> -                               GOTO first-loop
>> +         bb_before_first_loop:
>> +         if (FIRST_NITERS == 0) GOTO bb_before_second_loop
>> +                                GOTO first-loop
>>
>> -        first_loop:
>> -        do {
>> -        } while ...
>> +         first_loop:
>> +         do {
>> +         } while ...
>>
>> -        bb_before_second_loop:
>> +         bb_before_second_loop:
>>
>> -        second_loop:
>> -        do {
>> -        } while ...
>> +         second_loop:
>> +         do {
>> +         } while ...
>>
>> -        orig_exit_bb:
>> -   */
>> +         orig_exit_bb:
>> +  */
>> +
>> +
>> +  /* 2.b Add the cost model check that allows the prologue
>> +         to iterate for the entire unchanged scalar
>> +         iterations of the loop in the event that the cost
>> +         model indicates that the scalar loop is more
>> +         profitable than the vector one.
>> +
>> +         Resulting CFG after prologue peeling would be:
>> +
>> +         if (scalar_loop_iterations <= th)
>> +           FIRST_NITERS = scalar_loop_iterations
>> +
>> +         bb_before_first_loop:
>> +         if (FIRST_NITERS == 0) GOTO bb_before_second_loop
>> +                                GOTO first-loop
>> +
>> +         first_loop:
>> +         do {
>> +         } while ...
>> +
>> +         bb_before_second_loop:
>> +
>> +         second_loop:
>> +         do {
>> +         } while ...
>> +
>> +         orig_exit_bb:
>> +  */
>> +
>> +  /* 2.c Add the cost model check that allows the epilogue
>> +         to iterate for the entire unchanged scalar
>> +         iterations of the loop in the event that the cost
>> +         model indicates that the scalar loop is more
>> +         profitable than the vector one.
>> +
>> +         Resulting CFG after prologue peeling would be:
>> +
>> +         bb_before_first_loop:
>> +         if ((scalar_loop_iterations <= th)
>> +             ||
>> +             FIRST_NITERS == 0) GOTO bb_before_second_loop
>> +                                GOTO first-loop
>> +
>> +         first_loop:
>> +         do {
>> +         } while ...
>> +
>> +         bb_before_second_loop:
>> +
>> +         second_loop:
>> +         do {
>> +         } while ...
>> +
>> +         orig_exit_bb:
>> +  */
>>
>
>Just so it's clear that this function does only one of the above per
>invocation, maybe write: "2. Add the control/guard code in one of the
>following ways: 2.a. ..."

Ok.

>Also, I was wondering if it would be clearer if the description was
more
>compact - i.e. instead of duplicating 3 times the pseudo-code that
remains
>the same in all 3 alternatives, do something along these lines:
>"
>2. Add the control/guard code. Resulting CFG would be:
>
>         bb_before_first_loop:
>         if (COND) GOTO bb_before_second_loop
>                   GOTO first-loop
>
>         first_loop:
>         do {
>         } while ...
>
>         bb_before_second_loop:
>
>         second_loop:
>         do {
>         } while ...
>
>         orig_exit_bb:
>
>The following 3 variants are supported:
>2a. The guard controls whether the first loop is executed.
>       COND: FIRST_NITERS == 0
>2b. Add the cost model check that allows the prologue to iterate....
>      if (scalar_loop_iterations <= th)
>        FIRST_NITERS = scalar_loop_iterations
>      COND: FIRST_NITERS == 0
>2c. Add the cost model check that allows the epilogue to iterate...
>      COND: (scalar_loop_iterations <= th) || FIRST_NITERS == 0
>"
>
>... but in the end I'm not sure which way is clearer, so I'll leave the
>decision to you.
>

Ok, I will try to make it clearer. But its hard to say which is clearer
because the same function is used to do different things for prologue
and epilogue.

>A general comment about the changes to slpeel_tree_peel_loop_to_edge: I
>wonder if this function would be taken out of the vectorizer one day
and
>merged with the generic peeling utils. In any case, it would be
desirable
>if instead of manipulating the condition inside the function, the
function
>would get the condition as an input? (even if it ends up remaining a
>vectorizer specific function - I think this will be a good thing to
do). If
>this looks like too much work, then please consider factoring out the
new
>code that builds the condition in 3 different ways, or at least just
the
>part that creates the if-stmt for the "Peologue peeling case:

Ok.

>
>> +  /* Epilogue peeling.  */
>> +  if (!update_first_loop_count)
>> +    {
>...
>> +    }
>> +  /* Prologue peeling.  */
>> +  else
>> +    {
>> +      if (check)
>> +         first_niters = conditionally_reset_first_niters (...)
>> +
>...
>> +    }
>
>
>> Index: tree-vect-transform.c
>> ===================================================================
>> --- tree-vect-transform.c (revision 129356)
>> +++ tree-vect-transform.c (working copy)
>...
>> @@ -6482,28 +6483,35 @@ vect_do_peeling_for_loop_bound (loop_vec
>>
>>    loop_num  = loop->num;
>>
>> -  /* Analyze cost to set threshhold for vectorized loop.  */
>
>threshhold --> threshold
>
>...


Ok.

>> +      /* Analyze cost to set threshhold for vectorized loop.  */
>> +      min_profitable_iters = LOOP_VINFO_COST_MODEL_MIN_ITERS
>(loop_vinfo);
>>
>
>about the comment - we don't "Analyze cost" here, just use the already
>precalculcated threshold. (it also repeats in other places).
>
>> -  th = (unsigned) min_scalar_loop_bound;
>> -  if (min_profitable_iters
>> -      && (!min_scalar_loop_bound
>> -          || min_profitable_iters > min_scalar_loop_bound))
>> -    th = (unsigned) min_profitable_iters;
>> +      min_scalar_loop_bound = ((PARAM_VALUE
(PARAM_MIN_VECT_LOOP_BOUND)
>> +    * LOOP_VINFO_VECT_FACTOR (loop_vinfo)) - 1);
>>
>> -  if (((LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo) < 0)
>> -      || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
>> -      && vect_print_dump_info (REPORT_DETAILS))
>> -    fprintf (vect_dump, "vectorization may not be profitable.");
>> +      /* Use the cost model only if it is more conservative than
>> user specified
>> +  threshold.  */
>> +      th = (unsigned) min_scalar_loop_bound;
>> +      if (min_profitable_iters
>> +   && (!min_scalar_loop_bound
>> +       || min_profitable_iters > min_scalar_loop_bound))
>> + th = (unsigned) min_profitable_iters;
>> +
>> +      if (th && vect_print_dump_info (REPORT_DETAILS))
>> + fprintf (vect_dump, "loop bound peeling: vectorization may not be
>> profitable.");
>> +    }
>
>The above code repeats almost identically 3 times: here, in
>vect_do_peeling_for_alignment and in vect_loop_versioning. Please it
factor
>out...
>
Ok.

>> @@ -7031,7 +7078,11 @@ vect_create_cond_for_alias_checks (loop_
>>     data references that may or may not be aligned.  An additional
>>     sequence of runtime tests is generated for each pairs of DDRs
whose
>>     independence was not proven.  The vectorized version of loop is
>> -   executed only if both alias and alignment tests are passed.  */
>> +   executed only if both alias and alignment tests are passed.
>> +
>> +   The test generated to check which version of loop is executed
>> +   is modified to check for profitability as indicated by the
>
>--> "is modified to *also* check for profitability", right?
>

Yes.

Thanks,
Harsha

>thanks,
>dorit
>
>
>> The patch bootstrapped and passed test on amd64-linux.
>>
>> Ok for trunk?
>>
>> >
>> >> Thanks, David
>> >>
>> >>      * gcc.dg/vect/costmodel/ppc/ppc-costmodel-vect.exp: Use
>> >>    -mcpu=970 instead of 7400.
>> >>
>> >> Index: ppc-costmodel-vect.exp
>> >>
===================================================================
>> >> --- ppc-costmodel-vect.exp   (revision 129469)
>> >> +++ ppc-costmodel-vect.exp   (working copy)
>> >> @@ -49,7 +49,7 @@
>> >>  } else {
>> >>      if [is-effective-target ilp32] {
>> >>          # Specify a cpu that supports VMX for compile-only tests.
>> >> -        lappend DEFAULT_VECTCFLAGS "-mcpu=7400"
>> >> +        lappend DEFAULT_VECTCFLAGS "-mcpu=970"
>> >>      }
>> >>      set dg-do-what-default compile
>> >>  }
>> >
>> >
>>
>> [attachment "0008-costmodel-early-scalar.diff" deleted by Dorit
>> Nuzman/Haifa/IBM]
>
>




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]