[PATCH] Fix PR90332 by extending half size vector mode

Kewen.Lin linkw@linux.ibm.com
Wed Mar 18 14:12:00 GMT 2020


on 2020/3/18 下午6:40, Richard Biener wrote:
> On Wed, Mar 18, 2020 at 11:39 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
>>
>> On Wed, Mar 18, 2020 at 11:06 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>
>>> Hi,
>>>
>>> As PR90332 shows, the current scalar epilogue peeling for gaps
>>> elimination requires expected vec_init optab with two half size
>>> vector mode.  On Power, we don't support vector mode like V8QI,
>>> so can't support optab like vec_initv16qiv8qi.  But we want to
>>> leverage existing scalar mode like DI to init the desirable
>>> vector mode.  This patch is to extend the existing support for
>>> Power, as evaluated on Power9 we can see expected 1.9% speed up
>>> on SPEC2017 525.x264_r.
>>>
>>> Bootstrapped/regtested on powerpc64le-linux-gnu (LE) P8 and P9.
>>>
>>> Is it ok for trunk?
>>
>> There's already code exercising such a case in vectorizable_load
>> (VMAT_STRIDED_SLP) which you could have factored out.
>>
>>  vectype, bool slp,
>>              than the alignment boundary B.  Every vector access will
>>              be a multiple of B and so we are guaranteed to access a
>>              non-gap element in the same B-sized block.  */
>> +         machine_mode half_mode;
>>           if (overrun_p
>>               && gap < (vect_known_alignment_in_bytes (first_dr_info)
>>                         / vect_get_scalar_dr_size (first_dr_info)))
>> -           overrun_p = false;
>> -
>> +           {
>> +             overrun_p = false;
>> +             if (known_eq (nunits, (group_size - gap) * 2)
>> +                 && known_eq (nunits, group_size)
>> +                 && get_half_mode_for_vector (vectype, &half_mode))
>> +               DR_GROUP_HALF_MODE (first_stmt_info) = half_mode;
>> +           }
>>
>> why do you need to amend this case?
>>
>> I don't like storing DR_GROUP_HALF_MODE very much, later
>> you need a vector type and it looks cheap enough to recompute
>> it where you need it?  Iff then it doesn't belong to DR_GROUP
>> but to the stmt-info.
>>
>> I realize the original optimization was kind of a hack (and I was too
>> lazy to implement the integer mode construction path ...).
>>
>> So, can you factor out the existing code into a function returning
>> the vector type for construction for a vector type and a
>> pieces size?  So for V16QI and a pieces-size of 4 we'd
>> get either V16QI back (then construction from V4QI pieces
>> should work) or V4SI (then construction from SImode pieces
>> should work)?  Eventually as secondary output provide that
>> piece type (SI / V4QI).
> 
> Btw, why not implement the neccessary vector init patterns?
> 

Power doesn't support 64bit vector size, it looks a bit hacky and
confusing to introduce this kind of mode just for some optab requirement,
but I admit the optab hack can immediately make it work.  :)

BR,
Kewen



More information about the Gcc-patches mailing list