This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: New GCC options for loop vectorization

From: Richard Biener <richard dot guenther at gmail dot com>
To: Xinliang David Li <davidxl at google dot com>
Cc: "Joseph S. Myers" <joseph at codesourcery dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
Date: Mon, 23 Sep 2013 13:31:40 +0200
Subject: Re: New GCC options for loop vectorization
Authentication-results: sourceware.org; auth=none
References: <CAAkRFZJzPkDJe=y2RqDwXsegN1So-u8mvUM2Cc2=4yZm29ip5g at mail dot gmail dot com> <CAFiYyc204YzVZVSOKnzomiMZs0Spe7=8AMUz+ho-xzr6beahXg at mail dot gmail dot com> <CAAkRFZJ7ZFBK86f8ZUpq_YgnCj0d3GK=3r1QUGqp5_QzgEzx3w at mail dot gmail dot com> <CAFiYyc0g3V_CrHkhquZS=Fie4cD4rZNFArpj+72+QkEkCw1Uyg at mail dot gmail dot com> <CAAkRFZK0ueehNqELXqbupG6VrYOT_g=HbjYtQGuzsDR=e04QnQ at mail dot gmail dot com> <CAFiYyc3MOHYHFUqVVCorZvDXO0wp00N=F4qP8rVG2nxF0Oji_g at mail dot gmail dot com> <CAAkRFZ+_ky0w9jmw64+-+iAMiZNSzJkcXKjw8rkWwmrv0vGTFA at mail dot gmail dot com>

On Wed, Sep 18, 2013 at 10:21 PM, Xinliang David Li <davidxl@google.com> wrote:
> On Tue, Sep 17, 2013 at 1:20 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Mon, Sep 16, 2013 at 10:24 PM, Xinliang David Li <davidxl@google.com> wrote:
>>> On Mon, Sep 16, 2013 at 3:13 AM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>> On Fri, Sep 13, 2013 at 5:16 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>> On Fri, Sep 13, 2013 at 1:30 AM, Richard Biener
>>>>> <richard.guenther@gmail.com> wrote:
>>>>>> On Thu, Sep 12, 2013 at 10:31 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>>>> Currently -ftree-vectorize turns on both loop and slp vectorizations,
>>>>>>> but there is no simple way to turn on loop vectorization alone. The
>>>>>>> logic for default O3 setting is also complicated.
>>>>>>>
>>>>>>> In this patch, two new options are introduced:
>>>>>>>
>>>>>>> 1) -ftree-loop-vectorize
>>>>>>>
>>>>>>> This option is used to turn on loop vectorization only. option
>>>>>>> -ftree-slp-vectorize also becomes a first class citizen, and no funny
>>>>>>> business of Init(2) is needed.  With this change, -ftree-vectorize
>>>>>>> becomes a simple alias to -ftree-loop-vectorize +
>>>>>>> -ftree-slp-vectorize.
>>>>>>>
>>>>>>> For instance, to turn on only slp vectorize at O3, the old way is:
>>>>>>>
>>>>>>>      -O3 -fno-tree-vectorize -ftree-slp-vectorize
>>>>>>>
>>>>>>> With the new change it becomes:
>>>>>>>
>>>>>>>     -O3 -fno-loop-vectorize
>>>>>>>
>>>>>>>
>>>>>>> To turn on only loop vectorize at O2, the old way is
>>>>>>>
>>>>>>>     -O2 -ftree-vectorize -fno-slp-vectorize
>>>>>>>
>>>>>>> The new way is
>>>>>>>
>>>>>>>     -O2 -ftree-loop-vectorize
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2) -ftree-vect-loop-peeling
>>>>>>>
>>>>>>> This option is used to turn on/off loop peeling for alignment.  In the
>>>>>>> long run, this should be folded into the cheap cost model proposed by
>>>>>>> Richard.  This option is also useful in scenarios where peeling can
>>>>>>> introduce runtime problems:
>>>>>>> http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html  which happens to be
>>>>>>> common in practice.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Patch attached. Compiler boostrapped. Ok after testing?
>>>>>>
>>>>>> I'd like you to split 1) and 2), mainly because I agree on 1) but not on 2).
>>>>>
>>>>> Ok. Can you also comment on 2) ?
>>>>
>>>> I think we want to decide how granular we want to control the vectorizer
>>>> and using which mechanism.  My cost-model re-org makes
>>>> ftree-vect-loop-version a no-op (basically removes it), so 2) looks like
>>>> a step backwards in this context.
>>>
>>> Using cost model to do a coarse grain control/configuration is
>>> certainly something we want, but having a fine grain control is still
>>> useful.
>>>
>>>>
>>>> So, can you summarize what pieces (including versioning) of the vectorizer
>>>> you'd want to be able to disable separately?
>>>
>>> Loop peeling seems to be the main one. There is also a correctness
>>> issue related. For instance, the following code is common in practice,
>>> but loop peeling wrongly assumes initial base-alignment and generates
>>> aligned mov instruction after peeling, leading to SEGV.  Peeling is
>>> not something we can blindly turned on -- even when it is on, there
>>> should be a way to turn it off explicitly:
>>>
>>> char a[10000];
>>>
>>> void foo(int n)
>>> {
>>>   int* b = (int*)(a+n);
>>>   int i = 0;
>>>   for (; i < 1000; ++i)
>>>     b[i] = 1;
>>> }
>>>
>>> int main(int argn, char** argv)
>>> {
>>>   foo(argn);
>>> }
>>
>> But that's just a bug that should be fixed (looking into it).
>>
>>>>  Just disabling peeling for
>>>> alignment may get you into the versioning for alignment path (and thus
>>>> an unvectorized loop at runtime).
>>>
>>> This is not true for target supporting mis-aligned access. I have not
>>> seen a case where alignment driver loop version happens on x86.
>>>
>>>>Also it's know that the alignment peeling
>>>> code needs some serious TLC (it's outcome depends on the order of DRs,
>>>> the cost model it uses leaves to be desired as we cannot distinguish
>>>> between unaligned load and store costs).
>>>
>>> Yet another reason to turn it off as it is not effective anyways?
>>
>> As said I'll disable all remains of -ftree-vect-loop-version with the cost model
>> patch because it wasn't guarding versioning for aliasing but only versioning
>> for alignment.
>>
>> We have to be consistent here - if we add a way to disable peeling for
>> alignment then we certainly don't want to remove the ability to disable
>> versioning for alignment, no?
>
> We already have the ability to turn off versioning -- via --param. It
> is a more natural way to fine tune a pass instead of introducing a -f
> option. For this reason, your planned deprecation of the option is a
> good thing to do.
>
> For consistency, I think we should introduce a new parameter to turn
> on/off peeling. This can also be tied closely with the cost model
> change.
>
> The proposed patch attached. Does this one look ok?

I'd principally support adding a param but rather would for example make
it limit the number of peeled iterations (zero turning off the feature).  As
you implemented it it will give false messages for

      supportable_dr_alignment = vect_supportable_dr_alignment (dr, true);
      do_peeling = vector_alignment_reachable_p (dr);
      if (do_peeling)
        {
...
        }
      else
        {
          if (!aligned_access_p (dr))
            {
              if (dump_enabled_p ())
                dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                                 "vector alignment may not be reachable\n");
              break;
            }
        }

the else path.  I'd restructure this as

     if (!vector_alignment_reachable_p (dr))
       {
          if (!aligned_access_p (dr))
         ....
          continue;
       }

     if (known_alignment_for_access_p (dr))
      {
... check param against the number of iterations to peel for this DR ....
      }
     else
      {
... check param against max iterations ...
      }

an alternative is a tri-state param that would either allow no peeling at all,
just peeling with known number of iterations or all kind of peelings.

Thanks,
Richard.


> thanks,
>
> David
>
>
>
>>
>> Richard.
>>
>>>
>>> thanks,
>>>
>>> David
>>>
>>>>
>>>> Richard.
>>>>
>>>>>>
>>>>>> I've stopped a quick try doing 1) myself because
>>>>>>
>>>>>> @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options
>>>>>>          opts->x_flag_ipa_reference = false;
>>>>>>        break;
>>>>>>
>>>>>> +    case OPT_ftree_vectorize:
>>>>>> +      if (!opts_set->x_flag_tree_loop_vectorize)
>>>>>> + opts->x_flag_tree_loop_vectorize = value;
>>>>>> +      if (!opts_set->x_flag_tree_slp_vectorize)
>>>>>> + opts->x_flag_tree_slp_vectorize = value;
>>>>>> +      break;
>>>>>>
>>>>>> doesn't look obviously correct.  Does that handle
>>>>>>
>>>>>>   -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize
>>>>>>
>>>>>> or
>>>>>>
>>>>>>   -ftree-loop-vectorize -fno-tree-vectorize
>>>>>>
>>>>>> properly?  Currently at least
>>>>>>
>>>>>>   -ftree-slp-vectorize -fno-tree-vectorize
>>>>>>
>>>>>> doesn't "work".
>>>>>
>>>>>
>>>>> Right -- same is true for -fprofile-use option. FDO enables some
>>>>> passes, but can not re-enable them if they are flipped off before.
>>>>>
>>>>>>
>>>>>> That said, the option machinery doesn't handle an option being an alias
>>>>>> for two other options, so it's mechanism to contract positives/negatives
>>>>>> doesn't work here and the override hooks do not work reliably for
>>>>>> repeated options.
>>>>>>
>>>>>> Or am I wrong here?  Should we care at all?  Joseph?
>>>>>
>>>>> We should probably just document the behavior. Even better, we should
>>>>> deprecate the old option.
>>>>>
>>>>> thanks,
>>>>>
>>>>> David
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Richard.
>>>>>>
>>>>>>>
>>>>>>> thanks,
>>>>>>>
>>>>>>> David

Follow-Ups:
- Re: New GCC options for loop vectorization
  - From: Xinliang David Li

References:
- New GCC options for loop vectorization
  - From: Xinliang David Li
- Re: New GCC options for loop vectorization
  - From: Richard Biener
- Re: New GCC options for loop vectorization
  - From: Xinliang David Li
- Re: New GCC options for loop vectorization
  - From: Richard Biener
- Re: New GCC options for loop vectorization
  - From: Xinliang David Li
- Re: New GCC options for loop vectorization
  - From: Richard Biener
- Re: New GCC options for loop vectorization
  - From: Xinliang David Li

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]