This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, vec-tails] Support loop epilogue vectorization
On Wed, Nov 9, 2016 at 11:28 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Thanks Richard for your comments.
> Your proposed to handle epilogue loop just like normal short-trip loop
> but this is not exactly truth since e.g. epilogue must not be peeled
> for alignment.
Yes there must be some differences, my motivation is to minimize that
so we don't need to specially check normal/epilogue loops at too many
places.
Of course it's just my feeling when going through the patch set, and
could be wrong.
Thanks,
bin
>
> It is not clear for me what are my next steps? Should I re-design the
> patch completely or simply decompose the whole patch to different
> parts? But it means that we must start review process from beginning
> but release is closed to its end.
> Note also that i work for Intel till the end of year and have not idea
> who will continue working on this project.
>
> Any help will be appreciated.
>
> Thanks.
> Yuri.
>
> 2016-11-09 13:37 GMT+03:00 Bin.Cheng <amker.cheng@gmail.com>:
>> On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Hi All,
>>>
>>> I re-send all patches sent by Ilya earlier for review which support
>>> vectorization of loop epilogues and loops with low trip count. We
>>> assume that the only patch - vec-tails-07-combine-tail.patch - was not
>>> approved by Jeff.
>>>
>>> I did re-base of all patches and performed bootstrapping and
>>> regression testing that did not show any new failures. Also all
>>> changes related to new vect_do_peeling algorithm have been changed
>>> accordingly.
>>>
>>> Is it OK for trunk?
>>
>> Hi,
>> I can't approve patches, but had some comments after going through the
>> implementation.
>>
>> One confusing part is cost model change, as well as the way it's used
>> to decide how epilogue loop should be vectorized. Given vect-tail is
>> disabled at the moment and the cost change needs further tuning, is it
>> reasonable to split this part out and get vectorization part
>> reviewed/submitted independently? For example, let user specified
>> parameters make the decision for now. Cost and target dependent
>> changes should go in at last, this could make the patch easier to
>> read.
>>
>> The implementation computes/shares quite amount information from main
>> loop to epilogue loop vectorization. Furthermore, variables/fields
>> for such information are somehow named in a misleading way. For
>> example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the
>> flag controlling whether epilogue loop should be vectorized with
>> masking. However, it's actually controlled by exactly the same flag
>> as whether epilogue loop should be combined into the main loop with
>> masking:
>> @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo)
>>
>> slpeel_make_loop_iterate_ntimes (loop, niters_vector);
>>
>> + if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
>> + vect_combine_loop_epilogue (loop_vinfo);
>> +
>> /* Reduce loop iterations by the vectorization factor. */
>> scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf),
>> expected_iterations / vf);
>>
>> IMHO, we should decouple main loop vectorization and epilogue
>> vectorization as much as possible by sharing as few information as we
>> can. The general idea is to handle epilogue loop just like normal
>> short-trip loop. For example, we can rename
>> LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something
>> else), and we don't differentiate its meaning between main and
>> epilogue(short-trip) loop. It only indicates the current loop should
>> be vectorized with masking no matter it's a main loop or epilogue
>> loop, and it works just like the current implementation.
>>
>> After this change, we can refine vectorization and make it more
>> general for normal loop and epilogue(short trip) loop. For example,
>> this implementation sets LOOP_VINFO_PEELING_FOR_NITER for epilogue
>> loop and use it to control how it should be vectorized:
>> + if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
>> + {
>> + LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false;
>> + LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = false;
>> + }
>> + else if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
>> + && min_profitable_combine_iters >= 0)
>> + {
>>
>> This works, but not that good for understanding or maintaining.
>>
>> Thanks,
>> bin