[PATCH 5/7] vect: Support vector load/store with length in vectorizer

Kewen.Lin linkw@linux.ibm.com
Thu May 28 01:21:36 GMT 2020


on 2020/5/27 下午6:02, Richard Sandiford wrote:
> "Kewen.Lin" <linkw@linux.ibm.com> writes:
>> Hi Richard,
>>
>> Thanks for your comments!
>>
>> on 2020/5/26 锟斤拷锟斤拷8:49, Richard Sandiford wrote:
>>> "Kewen.Lin" <linkw@linux.ibm.com> writes:
>>>> @@ -626,6 +645,12 @@ public:
>>>>    /* True if have decided to use a fully-masked loop.  */
>>>>    bool fully_masked_p;
>>>>  
>>>> +  /* Records whether we still have the option of using a length access loop.  */
>>>> +  bool can_with_length_p;
>>>> +
>>>> +  /* True if have decided to use length access for the loop fully.  */
>>>> +  bool fully_with_length_p;
>>>
>>> Rather than duplicate the flags like this, I think we should have
>>> three bits of information:
>>>
>>> (1) Can the loop operate on partial vectors?  Starts off optimistically
>>>     assuming "yes", gets set to "no" when we find a counter-example.
>>>
>>> (2) If we do decide to use partial vectors, will we need loop masks?
>>>
>>> (3) If we do decide to use partial vectors, will we need lengths?
>>>
>>> Vectorisation using partial vectors succeeds if (1) && ((2) != (3))
>>>
>>> LOOP_VINFO_CAN_FULLY_MASK_P currently tracks (1) and
>>> LOOP_VINFO_MASKS currently tracks (2).  In pathological cases it's
>>> already possible to have (1) && !(2), see r9-6240 for an example.
>>>
>>> With the new support, LOOP_VINFO_LENS tracks (3).
>>>
>>> So I don't think we need the can_with_length_p.  What is now
>>> LOOP_VINFO_CAN_FULLY_MASK_P can continue to track (1) for both
>>> approaches, with the final choice of approach only being made
>>> at the end.  Maybe it would be worth renaming it to something
>>> more generic though, now that we have two approaches to partial
>>> vectorisation.
>>
>> I like this idea!  I could be wrong, but I'm afraid that we
>> can not have one common flag to be shared for both approaches,
>> the check criterias could be different for both approaches, one
>> counter example for length could be acceptable for masking, such
>> as length can only allow CONTIGUOUS related modes, but masking
>> can support more.  When we see acceptable VMAT_LOAD_STORE_LANES,
>> we leave LOOP_VINFO_CAN_FULLY_MASK_P true, later should length
>> checking turn it to false?  I guess no, assuming still true, then 
>> LOOP_VINFO_CAN_FULLY_MASK_P will mean only partial vectorization
>> for masking, not for both.  We can probably clean LOOP_VINFO_LENS
>> when the length checking is false, but we just know the vec is empty,
>> not sure we are unable to do partial vectorization with length,
>> when we see LOOP_VINFO_CAN_FULLY_MASK_P true, we could still
>> record length into it if possible.
> 
> Let's call the flag in (1) CAN_USE_PARTIAL_VECTORS_P rather than
> CAN_FULLY_MASK_P to (try to) avoid any confusion from the current name.
> 
> What I meant is that each vectorizable_* routine has the responsibility
> of finding a way of coping with partial vectorisation, or setting
> CAN_USE_PARTIAL_VECTORS_P to false if it can't.
> 
> vectorizable_load chooses the VMAT first, and then decides based on that
> whether partial vectorisation is supported.  There's no influence in
> the other direction (partial vectorisation doesn't determine the VMAT).
> 
> So once it has chosen a VMAT, vectorizable_load needs to try to find a way
> of handling the operation with partial vectorisation.  Currently the only
> way of doing that for VMAT_LOAD_STORE_LANES is using masks.  So at the
> moment there are two possible outcomes:
> 
> - The target supports the necessary IFN_MASK_LOAD_LANES function.
>   If so, we can use partial vectorisation for the statement, so we
>   leave CAN_USE_PARTIAL_VECTORS_P true and record the necessary masks
>   in LOOP_VINFO_MASKS.
> 
> - The target doesn't support the necessary IFN_MASK_LOAD_LANES function.
>   If so, we can't use partial vectorisation, so we clear
>   CAN_USE_PARTIAL_VECTORS_P.
> 
> That's how things work at the moment.  It would work in the same way
> for lengths if we ever supported IFN_LEN_LOAD_LANES: we'd check whether
> IFN_LEN_LOAD_LANES is available and record the length in LOOP_VINFO_LENS
> if so.  If partial vectorisation isn't supported (via masks or lengths),
> we'd continue to clear CAN_USE_PARTIAL_VECTORS_P.
> 
> But equally, if we never add support for IFN_LEN_LOAD_LANES, the current
> code continues to work with length-based approaches.  We'll continue to
> clear CAN_USE_PARTIAL_VECTORS_P for VMAT_LOAD_STORE_LANES when the
> target provides no IFN_MASK_LOAD_LANES function.
> 

Thanks a lot for your detailed explanation!  This proposal looks good
based on the current implementation of both masking and length.  I may
think too much, but I had a bit concern as below when some targets have
both masking and length supports in future, such as ppc adds masking
support like SVE.

I assumed that you meant each vectorizable_* routine should record the
objs for any available partial vectorisation approaches.  If one target
supports both, we would have both recorded but decide not to do partial
vectorisation finally since both have records.  The target can disable
length like through optab to resolve it, but there is one possibility
that the masking support can be imperfect initially since ISA support
could be gradual, it further leads some vectorizable_* check or final
verification to fail for masking, and length approach may work here but
it gets disabled.  We can miss to use partial vectorisation here.

The other assumption is that each vectorizable_* routine record the 
first available partial vectorisation approach, let's assume masking
takes preference, then it's fine to record just one here even if one
target supports both approaches, but we still have the possiblity to
miss the partial vectorisation chance as some check/verify fail with
masking but fine with length.

Does this concern make sense?

BR,
Kewen

> As I say, this is all predicated on the assumption that we don't need
> to mix both masks and lengths in the same loop, and so can decide not
> to use partial vectorisation when both masks and lengths have been
> recorded.  This is a check that would happen at the end, after all
> statements have been analysed.
> 
> (There's no reason in principle why we *couldn't* support both
> approaches in the same loop, but it's not worth adding the code
> for that until there's a use case.)
> 
> Thanks,
> Richard
> 




More information about the Gcc-patches mailing list