[PATCH][2/n] 2nd try: Re-organize -fvect-cost-model, enable basic vectorization at -O2

Wed Aug 28 18:15:00 GMT 2013

On Wed, Aug 28, 2013 at 12:59 AM, Richard Biener <rguenther@suse.de> wrote:
> On Tue, 27 Aug 2013, Xinliang David Li wrote:
>
>> Richard, I have some comments about the patch.
>>
>> >   -ftree-vectorizer-verbose=<number>    This switch is deprecated. Use -fopt-info instead.
>> >
>> >   ftree-slp-vectorize
>> > ! Common Report Var(flag_tree_slp_vectorize) Optimization
>> >   Enable basic block vectorization (SLP) on trees
>>
>> The code dealing with the interactions between -ftree-vectorize, O3,
>> etc are complicated and hard to understand. Is it better to change the
>> meaning of -ftree-vectorize to mean -floop-vectorize only, and make it
>> independent of -fslp-vectorize?  P
>
> Yeah, but that would be an independent change.  Also people expect
> to be able to enable all of the vectorizer with -ftree-vectorize.
> So rather we introduce -floop-vectorize?

I think that will be good and simplify the logic too --
ftree-vectorize turns on both loop and slp if they are not explicitly
specified.

>
>> > + fvect-cost-model=
>> > + Common Joined RejectNegative Enum(vect_cost_model) Var(flag_vect_cost_model) Init(VECT_COST_MODEL_DEFAULT)
>> > + Specifies the cost model for vectorization
>> > +
>> > + Enum
>> > + Name(vect_cost_model) Type(enum vect_cost_model) UnknownError(unknown vectorizer cost model %qs)
>> > +
>> > + EnumValue
>> > + Enum(vect_cost_model) String(unlimited) Value(VECT_COST_MODEL_UNLIMITED)
>> > +
>> > + EnumValue
>> > + Enum(vect_cost_model) String(dynamic) Value(VECT_COST_MODEL_DYNAMIC)
>> > +
>> > + EnumValue
>> > + Enum(vect_cost_model) String(cheap) Value(VECT_COST_MODEL_CHEAP)
>>
>> Introducing cheap model is a great change.
>>
>> > +
>>
>> > *** 173,179 ****
>> >   {
>> >     struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>> >
>> > !   if ((unsigned) PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS) == 0)
>> >       return false;
>> >
>> >     if (dump_enabled_p ())
>> > --- 173,180 ----
>> >   {
>> >     struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>> >
>> > !   if (loop_vinfo->cost_model == VECT_COST_MODEL_CHEAP
>> > !       || (unsigned) PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS) == 0)
>> >       return false;
>> >
>>
>> When the cost_model == cheap, the alignment peeling should also be
>> disabled -- there will still be loops that are beneficial to be
>> vectorized without peeling -- at perhaps reduced net runtime gain.
>
> IIRC there are targets that cannot vectorize unaligned accesses, so
> in the end the cost model needs to be more target-controlled.
>
> The above was just a start for experimenting, of course.
>
>> >   struct gimple_opt_pass pass_slp_vectorize =
>> > --- 206,220 ----
>> >   static bool
>> >   gate_vect_slp (void)
>> >   {
>> > !   /* Apply SLP either according to whether the user specified whether to
>> > !      run SLP or not, or according to whether the user specified whether
>> > !      to do vectorization or not.  */
>> > !   if (global_options_set.x_flag_tree_slp_vectorize)
>> > !     return flag_tree_slp_vectorize != 0;
>> > !   if (global_options_set.x_flag_tree_vectorize)
>> > !     return flag_tree_vectorize != 0;
>> > !   /* And if vectorization was enabled by default run SLP only at -O3.  */
>> > !   return flag_tree_vectorize != 0 && optimize == 3;
>> >   }
>>
>> The logic can be greatly simplified if slp vectorizer is controlled
>> independently -- easier for user to understand too.
>
> It should work with separating out -floop-vectorize, too I guess.  But
> yes, as I wanted to preserve behavior of adding -ftree-vectorize to
> -O2 the above necessarily became quite complicated ;)

With floop-vectorize, ftree-vectorize becomes a simple shorthand/alias
to 'floop-vectorize + fslp-vectorize', and O3, O2 does not need to
look at ftree-vectorize (which does even need a flag variable).

>
>> > ! @item -fvect-cost-model=@var{model}
>> >   @opindex fvect-cost-model
>> > ! Alter the cost model used for vectorization.  The @var{model} argument
>> > ! should be one of @code{unlimited}, @code{dynamic} or @code{cheap}.
>> > ! With the @code{unlimited} model the vectorized code-path is assumed
>> > ! to be profitable while with the @code{dynamic} model a runtime check
>> > ! will guard the vectorized code-path to enable it only for iteration
>> > ! counts that will likely execute faster than when executing the original
>> > ! scalar loop.  The @code{cheap} model will disable vectorization of
>> > ! loops where doing so would be cost prohibitive for example due to
>> > ! required runtime checks for data dependence or alignment but otherwise
>> > ! is equal to the @code{dynamic} model.
>> > ! The default cost model depends on other optimization flags and is
>> > ! either @code{dynamic} or @code{cheap}.
>> >
>>
>> Vectorizer in theory will only vectorize a loop with net runtime gain,
>> so the 'cost' here should only mean code size and compile time cost.
>
> Not exactly - for 'unlimited' we may enter the vectorized path
> even if the overhead of the guards, prologue and epilogue exceeds
> the benefit of the (eventually never entered) vectorized loop.
> That is, the 'dynamic' model does
>
>   if (n > profitable-iters)
>     {
>       if (alias checks, align checks)
>         {
>           prologue loop
>           vectorized loop
>           epilogue loop
>         }
>       else goto scalar loop
>     }
>   else
>     scalar loop

That is why I said 'in theory' -- the compiler bets the vectorized
path will be taken at runtime and benefit the performance. If there is
reason for compiler to believe it is not the case (e.g, with profile
data), it won't even try it.

thanks,

David

>
> because clearly the more complicated flow is not always profitable
> to enter.
>
>> Cheap Model: with this model, the compiler will vectorize loops that
>> are considered beneficial for runtime performance with minimal code
>> size increase and compile time cost
>> Unlimited Model: compiler will vectorize loops to maximize runtime
>> gain without considering compile time cost and impact to code size;
> ... and runtime speed
>
> But you are right - changing the wording to tell what it will vectorize
> as opposed to what not would be an improvement.
>
> Richard.