This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, vec-tails] Support loop epilogue vectorization


I am familiar with SVE extension and understand that implemented
approach might be not suitable for ARM. The main point is that only
load/store instructions are masked but all other calculations are not
(we did special conversion for reduction statements to implement
merging predication semantic). For SVE peeling for niters is not
required but it is not true for x86 -  we must determine what
vectorization scheme is more profitable: loop combining (the only
essential for SVE) or separate epilogue vectorization using masking or
less vectorization factor. So I'd like to have the full list of
required changes to our implementation to try to remove them.

Thanks.
Yuri.

2016-11-09 14:46 GMT+03:00 Bin.Cheng <amker.cheng@gmail.com>:
> On Wed, Nov 9, 2016 at 11:28 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Thanks Richard for your comments.
>> Your proposed to handle epilogue loop just like normal short-trip loop
>> but this is not exactly truth since e.g. epilogue must not be peeled
>> for alignment.
> Yes there must be some differences, my motivation is to minimize that
> so we don't need to specially check normal/epilogue loops at too many
> places.
> Of course it's just my feeling when going through the patch set, and
> could be wrong.
>
> Thanks,
> bin
>>
>> It is not clear for me what are my next steps? Should I re-design the
>> patch completely or simply decompose the whole patch to different
>> parts? But it means that we must start review process from beginning
>> but release is closed to its end.
>> Note also that i work for Intel till the end of year and have not idea
>> who will continue working on this project.
>>
>> Any help will be appreciated.
>>
>> Thanks.
>> Yuri.
>>
>> 2016-11-09 13:37 GMT+03:00 Bin.Cheng <amker.cheng@gmail.com>:
>>> On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Hi All,
>>>>
>>>> I re-send all patches sent by Ilya earlier for review which support
>>>> vectorization of loop epilogues and loops with low trip count. We
>>>> assume that the only patch - vec-tails-07-combine-tail.patch - was not
>>>> approved by Jeff.
>>>>
>>>> I did re-base of all patches and performed bootstrapping and
>>>> regression testing that did not show any new failures. Also all
>>>> changes related to new vect_do_peeling algorithm have been changed
>>>> accordingly.
>>>>
>>>> Is it OK for trunk?
>>>
>>> Hi,
>>> I can't approve patches, but had some comments after going through the
>>> implementation.
>>>
>>> One confusing part is cost model change, as well as the way it's used
>>> to decide how epilogue loop should be vectorized.  Given vect-tail is
>>> disabled at the moment and the cost change needs further tuning, is it
>>> reasonable to split this part out and get vectorization part
>>> reviewed/submitted independently?  For example, let user specified
>>> parameters make the decision for now.  Cost and target dependent
>>> changes should go in at last, this could make the patch easier to
>>> read.
>>>
>>> The implementation computes/shares quite amount information from main
>>> loop to epilogue loop vectorization.  Furthermore, variables/fields
>>> for such information are somehow named in a misleading way.  For
>>> example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the
>>> flag controlling whether epilogue loop should be vectorized with
>>> masking.  However, it's actually controlled by exactly the same flag
>>> as whether epilogue loop should be combined into the main loop with
>>> masking:
>>> @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo)
>>>
>>>    slpeel_make_loop_iterate_ntimes (loop, niters_vector);
>>>
>>> +  if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
>>> +    vect_combine_loop_epilogue (loop_vinfo);
>>> +
>>>    /* Reduce loop iterations by the vectorization factor.  */
>>>    scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf),
>>>                expected_iterations / vf);
>>>
>>> IMHO, we should decouple main loop vectorization and epilogue
>>> vectorization as much as possible by sharing as few information as we
>>> can.  The general idea is to handle epilogue loop just like normal
>>> short-trip loop.  For example, we can rename
>>> LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something
>>> else), and we don't differentiate its meaning between main and
>>> epilogue(short-trip) loop.  It only indicates the current loop should
>>> be vectorized with masking no matter it's a main loop or epilogue
>>> loop, and it works just like the current implementation.
>>>
>>> After this change, we can refine vectorization and make it more
>>> general for normal loop and epilogue(short trip) loop.  For example,
>>> this implementation sets LOOP_VINFO_PEELING_FOR_NITER  for epilogue
>>> loop and use it to control how it should be vectorized:
>>> +  if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
>>> +    {
>>> +      LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false;
>>> +      LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = false;
>>> +    }
>>> +  else if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
>>> +       && min_profitable_combine_iters >= 0)
>>> +    {
>>>
>>> This works, but not that good for understanding or maintaining.
>>>
>>> Thanks,
>>> bin


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]