[PATCH] Fix PR90332 by extending half size vector mode
Kewen.Lin
linkw@linux.ibm.com
Wed Mar 18 14:12:00 GMT 2020
on 2020/3/18 下午6:40, Richard Biener wrote:
> On Wed, Mar 18, 2020 at 11:39 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
>>
>> On Wed, Mar 18, 2020 at 11:06 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>
>>> Hi,
>>>
>>> As PR90332 shows, the current scalar epilogue peeling for gaps
>>> elimination requires expected vec_init optab with two half size
>>> vector mode. On Power, we don't support vector mode like V8QI,
>>> so can't support optab like vec_initv16qiv8qi. But we want to
>>> leverage existing scalar mode like DI to init the desirable
>>> vector mode. This patch is to extend the existing support for
>>> Power, as evaluated on Power9 we can see expected 1.9% speed up
>>> on SPEC2017 525.x264_r.
>>>
>>> Bootstrapped/regtested on powerpc64le-linux-gnu (LE) P8 and P9.
>>>
>>> Is it ok for trunk?
>>
>> There's already code exercising such a case in vectorizable_load
>> (VMAT_STRIDED_SLP) which you could have factored out.
>>
>> vectype, bool slp,
>> than the alignment boundary B. Every vector access will
>> be a multiple of B and so we are guaranteed to access a
>> non-gap element in the same B-sized block. */
>> + machine_mode half_mode;
>> if (overrun_p
>> && gap < (vect_known_alignment_in_bytes (first_dr_info)
>> / vect_get_scalar_dr_size (first_dr_info)))
>> - overrun_p = false;
>> -
>> + {
>> + overrun_p = false;
>> + if (known_eq (nunits, (group_size - gap) * 2)
>> + && known_eq (nunits, group_size)
>> + && get_half_mode_for_vector (vectype, &half_mode))
>> + DR_GROUP_HALF_MODE (first_stmt_info) = half_mode;
>> + }
>>
>> why do you need to amend this case?
>>
>> I don't like storing DR_GROUP_HALF_MODE very much, later
>> you need a vector type and it looks cheap enough to recompute
>> it where you need it? Iff then it doesn't belong to DR_GROUP
>> but to the stmt-info.
>>
>> I realize the original optimization was kind of a hack (and I was too
>> lazy to implement the integer mode construction path ...).
>>
>> So, can you factor out the existing code into a function returning
>> the vector type for construction for a vector type and a
>> pieces size? So for V16QI and a pieces-size of 4 we'd
>> get either V16QI back (then construction from V4QI pieces
>> should work) or V4SI (then construction from SImode pieces
>> should work)? Eventually as secondary output provide that
>> piece type (SI / V4QI).
>
> Btw, why not implement the neccessary vector init patterns?
>
Power doesn't support 64bit vector size, it looks a bit hacky and
confusing to introduce this kind of mode just for some optab requirement,
but I admit the optab hack can immediately make it work. :)
BR,
Kewen
More information about the Gcc-patches
mailing list