This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: New GCC options for loop vectorization
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Xinliang David Li <davidxl at google dot com>
- Cc: "Joseph S. Myers" <joseph at codesourcery dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 24 Sep 2013 13:09:47 +0200
- Subject: Re: New GCC options for loop vectorization
- Authentication-results: sourceware.org; auth=none
- References: <CAAkRFZJzPkDJe=y2RqDwXsegN1So-u8mvUM2Cc2=4yZm29ip5g at mail dot gmail dot com> <CAFiYyc204YzVZVSOKnzomiMZs0Spe7=8AMUz+ho-xzr6beahXg at mail dot gmail dot com> <CAAkRFZJ7ZFBK86f8ZUpq_YgnCj0d3GK=3r1QUGqp5_QzgEzx3w at mail dot gmail dot com> <CAFiYyc0g3V_CrHkhquZS=Fie4cD4rZNFArpj+72+QkEkCw1Uyg at mail dot gmail dot com> <CAAkRFZK0ueehNqELXqbupG6VrYOT_g=HbjYtQGuzsDR=e04QnQ at mail dot gmail dot com> <CAFiYyc3MOHYHFUqVVCorZvDXO0wp00N=F4qP8rVG2nxF0Oji_g at mail dot gmail dot com> <CAAkRFZ+_ky0w9jmw64+-+iAMiZNSzJkcXKjw8rkWwmrv0vGTFA at mail dot gmail dot com> <CAFiYyc0ZHZK4zD3bhqqm1GZ66ZiW4_=Kj1jMkmiWFJDjTuMHow at mail dot gmail dot com> <CAAkRFZLa+ENGNhBhpmwTWKTFbxAkqhs2KnNBdzZ0qmn39Ocihw at mail dot gmail dot com>
On Mon, Sep 23, 2013 at 10:37 PM, Xinliang David Li <davidxl@google.com> wrote:
> Thanks. I modified the patch so that the max allowed peel iterations
> can be specified via a parameter. Testing on going. Ok for trunk ?
+DEFPARAM(PARAM_VECT_MAX_PEELING_FOR_ALIGNMENT,
+ "vect-max-peeling-for-alignment",
+ "Max number of loop peels to enhancement alignment of data
references in a loop",
+ -1, 0, 0)
I believe this doesn't allow --param vect-max-peeling-for-alignment=-1 as you
specify 0 as the minimum.
So, ok with changing minimum and maxmum to -1. (double check that this works)
Thanks,
Richard.
> David
>
> On Mon, Sep 23, 2013 at 4:31 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Wed, Sep 18, 2013 at 10:21 PM, Xinliang David Li <davidxl@google.com> wrote:
>>> On Tue, Sep 17, 2013 at 1:20 AM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>> On Mon, Sep 16, 2013 at 10:24 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>> On Mon, Sep 16, 2013 at 3:13 AM, Richard Biener
>>>>> <richard.guenther@gmail.com> wrote:
>>>>>> On Fri, Sep 13, 2013 at 5:16 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>>>> On Fri, Sep 13, 2013 at 1:30 AM, Richard Biener
>>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>>> On Thu, Sep 12, 2013 at 10:31 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>>>>>> Currently -ftree-vectorize turns on both loop and slp vectorizations,
>>>>>>>>> but there is no simple way to turn on loop vectorization alone. The
>>>>>>>>> logic for default O3 setting is also complicated.
>>>>>>>>>
>>>>>>>>> In this patch, two new options are introduced:
>>>>>>>>>
>>>>>>>>> 1) -ftree-loop-vectorize
>>>>>>>>>
>>>>>>>>> This option is used to turn on loop vectorization only. option
>>>>>>>>> -ftree-slp-vectorize also becomes a first class citizen, and no funny
>>>>>>>>> business of Init(2) is needed. With this change, -ftree-vectorize
>>>>>>>>> becomes a simple alias to -ftree-loop-vectorize +
>>>>>>>>> -ftree-slp-vectorize.
>>>>>>>>>
>>>>>>>>> For instance, to turn on only slp vectorize at O3, the old way is:
>>>>>>>>>
>>>>>>>>> -O3 -fno-tree-vectorize -ftree-slp-vectorize
>>>>>>>>>
>>>>>>>>> With the new change it becomes:
>>>>>>>>>
>>>>>>>>> -O3 -fno-loop-vectorize
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> To turn on only loop vectorize at O2, the old way is
>>>>>>>>>
>>>>>>>>> -O2 -ftree-vectorize -fno-slp-vectorize
>>>>>>>>>
>>>>>>>>> The new way is
>>>>>>>>>
>>>>>>>>> -O2 -ftree-loop-vectorize
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2) -ftree-vect-loop-peeling
>>>>>>>>>
>>>>>>>>> This option is used to turn on/off loop peeling for alignment. In the
>>>>>>>>> long run, this should be folded into the cheap cost model proposed by
>>>>>>>>> Richard. This option is also useful in scenarios where peeling can
>>>>>>>>> introduce runtime problems:
>>>>>>>>> http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html which happens to be
>>>>>>>>> common in practice.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Patch attached. Compiler boostrapped. Ok after testing?
>>>>>>>>
>>>>>>>> I'd like you to split 1) and 2), mainly because I agree on 1) but not on 2).
>>>>>>>
>>>>>>> Ok. Can you also comment on 2) ?
>>>>>>
>>>>>> I think we want to decide how granular we want to control the vectorizer
>>>>>> and using which mechanism. My cost-model re-org makes
>>>>>> ftree-vect-loop-version a no-op (basically removes it), so 2) looks like
>>>>>> a step backwards in this context.
>>>>>
>>>>> Using cost model to do a coarse grain control/configuration is
>>>>> certainly something we want, but having a fine grain control is still
>>>>> useful.
>>>>>
>>>>>>
>>>>>> So, can you summarize what pieces (including versioning) of the vectorizer
>>>>>> you'd want to be able to disable separately?
>>>>>
>>>>> Loop peeling seems to be the main one. There is also a correctness
>>>>> issue related. For instance, the following code is common in practice,
>>>>> but loop peeling wrongly assumes initial base-alignment and generates
>>>>> aligned mov instruction after peeling, leading to SEGV. Peeling is
>>>>> not something we can blindly turned on -- even when it is on, there
>>>>> should be a way to turn it off explicitly:
>>>>>
>>>>> char a[10000];
>>>>>
>>>>> void foo(int n)
>>>>> {
>>>>> int* b = (int*)(a+n);
>>>>> int i = 0;
>>>>> for (; i < 1000; ++i)
>>>>> b[i] = 1;
>>>>> }
>>>>>
>>>>> int main(int argn, char** argv)
>>>>> {
>>>>> foo(argn);
>>>>> }
>>>>
>>>> But that's just a bug that should be fixed (looking into it).
>>>>
>>>>>> Just disabling peeling for
>>>>>> alignment may get you into the versioning for alignment path (and thus
>>>>>> an unvectorized loop at runtime).
>>>>>
>>>>> This is not true for target supporting mis-aligned access. I have not
>>>>> seen a case where alignment driver loop version happens on x86.
>>>>>
>>>>>>Also it's know that the alignment peeling
>>>>>> code needs some serious TLC (it's outcome depends on the order of DRs,
>>>>>> the cost model it uses leaves to be desired as we cannot distinguish
>>>>>> between unaligned load and store costs).
>>>>>
>>>>> Yet another reason to turn it off as it is not effective anyways?
>>>>
>>>> As said I'll disable all remains of -ftree-vect-loop-version with the cost model
>>>> patch because it wasn't guarding versioning for aliasing but only versioning
>>>> for alignment.
>>>>
>>>> We have to be consistent here - if we add a way to disable peeling for
>>>> alignment then we certainly don't want to remove the ability to disable
>>>> versioning for alignment, no?
>>>
>>> We already have the ability to turn off versioning -- via --param. It
>>> is a more natural way to fine tune a pass instead of introducing a -f
>>> option. For this reason, your planned deprecation of the option is a
>>> good thing to do.
>>>
>>> For consistency, I think we should introduce a new parameter to turn
>>> on/off peeling. This can also be tied closely with the cost model
>>> change.
>>>
>>> The proposed patch attached. Does this one look ok?
>>
>> I'd principally support adding a param but rather would for example make
>> it limit the number of peeled iterations (zero turning off the feature). As
>> you implemented it it will give false messages for
>>
>> supportable_dr_alignment = vect_supportable_dr_alignment (dr, true);
>> do_peeling = vector_alignment_reachable_p (dr);
>> if (do_peeling)
>> {
>> ...
>> }
>> else
>> {
>> if (!aligned_access_p (dr))
>> {
>> if (dump_enabled_p ())
>> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> "vector alignment may not be reachable\n");
>> break;
>> }
>> }
>>
>> the else path. I'd restructure this as
>>
>> if (!vector_alignment_reachable_p (dr))
>> {
>> if (!aligned_access_p (dr))
>> ....
>> continue;
>> }
>>
>> if (known_alignment_for_access_p (dr))
>> {
>> ... check param against the number of iterations to peel for this DR ....
>> }
>> else
>> {
>> ... check param against max iterations ...
>> }
>>
>> an alternative is a tri-state param that would either allow no peeling at all,
>> just peeling with known number of iterations or all kind of peelings.
>>
>> Thanks,
>> Richard.
>>
>>
>>> thanks,
>>>
>>> David
>>>
>>>
>>>
>>>>
>>>> Richard.
>>>>
>>>>>
>>>>> thanks,
>>>>>
>>>>> David
>>>>>
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>>>
>>>>>>>> I've stopped a quick try doing 1) myself because
>>>>>>>>
>>>>>>>> @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options
>>>>>>>> opts->x_flag_ipa_reference = false;
>>>>>>>> break;
>>>>>>>>
>>>>>>>> + case OPT_ftree_vectorize:
>>>>>>>> + if (!opts_set->x_flag_tree_loop_vectorize)
>>>>>>>> + opts->x_flag_tree_loop_vectorize = value;
>>>>>>>> + if (!opts_set->x_flag_tree_slp_vectorize)
>>>>>>>> + opts->x_flag_tree_slp_vectorize = value;
>>>>>>>> + break;
>>>>>>>>
>>>>>>>> doesn't look obviously correct. Does that handle
>>>>>>>>
>>>>>>>> -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize
>>>>>>>>
>>>>>>>> or
>>>>>>>>
>>>>>>>> -ftree-loop-vectorize -fno-tree-vectorize
>>>>>>>>
>>>>>>>> properly? Currently at least
>>>>>>>>
>>>>>>>> -ftree-slp-vectorize -fno-tree-vectorize
>>>>>>>>
>>>>>>>> doesn't "work".
>>>>>>>
>>>>>>>
>>>>>>> Right -- same is true for -fprofile-use option. FDO enables some
>>>>>>> passes, but can not re-enable them if they are flipped off before.
>>>>>>>
>>>>>>>>
>>>>>>>> That said, the option machinery doesn't handle an option being an alias
>>>>>>>> for two other options, so it's mechanism to contract positives/negatives
>>>>>>>> doesn't work here and the override hooks do not work reliably for
>>>>>>>> repeated options.
>>>>>>>>
>>>>>>>> Or am I wrong here? Should we care at all? Joseph?
>>>>>>>
>>>>>>> We should probably just document the behavior. Even better, we should
>>>>>>> deprecate the old option.
>>>>>>>
>>>>>>> thanks,
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> thanks,
>>>>>>>>>
>>>>>>>>> David