This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix PR82255 (vectorizer cost model overcounts some vector load costs)


On Sep 20, 2017, at 3:49 AM, Richard Sandiford <richard.sandiford@linaro.org> wrote:
> 
> Bill Schmidt <wschmidt@linux.vnet.ibm.com> writes:
>> On Sep 19, 2017, at 4:21 PM, Bill Schmidt <wschmidt@linux.vnet.ibm.com> wrote:
>>> On Sep 19, 2017, at 3:58 PM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> Bill Schmidt <wschmidt@linux.vnet.ibm.com> writes:
>>>>> Index: gcc/tree-vect-stmts.c
>>>>> ===================================================================
>>>>> --- gcc/tree-vect-stmts.c	(revision 252760)
>>>>> +++ gcc/tree-vect-stmts.c	(working copy)
>>>>> @@ -1091,8 +1091,19 @@ vect_model_load_cost (stmt_vec_info stmt_info, int
>>>>> 			prologue_cost_vec, body_cost_vec, true);
>>>>> if (memory_access_type == VMAT_ELEMENTWISE
>>>>>     || memory_access_type == VMAT_STRIDED_SLP)
>>>>> -    inside_cost += record_stmt_cost (body_cost_vec, ncopies, vec_construct,
>>>>> -				     stmt_info, 0, vect_body);
>>>>> +    {
>>>>> +      int group_size = GROUP_SIZE (stmt_info);
>>>>> +      int nunits = TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info));
>>>>> +      if (group_size < nunits)
>>>>> +	{
>>>>> +	  if (dump_enabled_p ())
>>>>> +	    dump_printf_loc (MSG_NOTE, vect_location,
>>>>> +			     "vect_model_load_cost: vec_construct required");
>>>>> +	  inside_cost += record_stmt_cost (body_cost_vec, ncopies,
>>>>> +					   vec_construct, stmt_info, 0,
>>>>> +					   vect_body);
>>>>> +	}
>>>>> +    }
>>>>> 
>>>>> if (dump_enabled_p ())
>>>>>   dump_printf_loc (MSG_NOTE, vect_location,
>>>> 
>>>> This feels like we've probably got the wrong memory_access_type.
>>>> If it's a just a contiguous load then it should be VMAT_CONTIGUOUS
>>>> instead.
>>> 
>>> I tend to agree -- that will take more surgery to the code that
>>> detects strided loads in
>>> both the cost model analysis and in vectorizable_load.
>>> 
>> It's just a little less clear how to clean this up in
>> vect_analyze_data_refs.  The too-simplistic test there now is:
>> 
>>      else if (is_a <loop_vec_info> (vinfo)
>>               && TREE_CODE (DR_STEP (dr)) != INTEGER_CST)
>>        {
>>          if (nested_in_vect_loop_p (loop, stmt))
>>            {
>>              if (dump_enabled_p ())
>> 		{
>>                  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>>                                   "not vectorized: not suitable for strided "
>>                                   "load ");
>>                  dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
>>                }
>>              return false;
>>            }
>>          STMT_VINFO_STRIDED_P (stmt_info) = true;
>>        }
>> 
>> Furthermore, this is being set while processing each data reference, not after
>> all information has been gathered and analyzed.  So this is too early.
> 
> Yeah, agree we can't tell at this stage.
> 
>> Next place to fix this would be in vect_analyze_slp_cost_1, where
>> 
>>      vect_memory_access_type memory_access_type
>>        = (STMT_VINFO_STRIDED_P (stmt_info)
>>           ? VMAT_STRIDED_SLP
>>           : VMAT_CONTIGUOUS);
>> 
>> is too simplistic.  I guess we could do something here, but it would
>> require unpacking all the grouped accesses and duplicating all the
>> logic from scratch to determine whether it's a single load or not.
> 
> I think we can use:
> 
>    SLP_INSTANCE_UNROLLING_FACTOR (instance) * ncopies_for_cost
> 
> to calculate the number of times that the vector SLP statements repeat
> the original scalar group.  We should then only use STRIDED_SLP if
> that's greater than 1.
> 
> This affects stores as well as loads, so fixing it here will work
> for both.

Thanks, Richard, that makes sense.  Let me play around with it a bit.
> 
> But I think this shows up another problem.  In the vectorised loop,
> we have 1 copy of the load and 4 copies of the ABS (after unpacking).
> But vect_analyze_slp_cost_1 is being called with ncopies_for_cost == 4
> even for the loads.  So I think it's a double whammy: we've quadrupling
> the real cost via the excessive ncopies_for_cost, and we're using the
> wrong access type too.

Absolutely right, I failed to pay attention to this!  Good catch.
> 
> The patch below should fix the second problem but isn't much use without
> the first.  I think we need to adjust ncopies_for_cost in the recursive
> calls as well.

Just to be clear, I think you mean it fixes the original problem (wrong
access type).  I'll look into how to fix the excessive ncopies_for_cost.
> 
> Of course, fixing an overestimate here is likely to expose an
> underestimate elsewhere. :-)

Right...it was ever thus... 

Very much appreciate your assistance!

Bill
> 
> Interesting test case though.  It seems counterproductive to unroll
> a loop like that before vectorisation.
> 
> Thanks,
> Richard
> 
> 
> Index: gcc/tree-vect-slp.c
> ===================================================================
> --- gcc/tree-vect-slp.c	(revision 252986)
> +++ gcc/tree-vect-slp.c	(working copy)
> @@ -1687,8 +1687,10 @@
>   stmt_info = vinfo_for_stmt (stmt);
>   if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
>     {
> +      unsigned int group_ncopies = (SLP_INSTANCE_UNROLLING_FACTOR (instance)
> +				   * ncopies_for_cost);
>       vect_memory_access_type memory_access_type
> -	= (STMT_VINFO_STRIDED_P (stmt_info)
> +	= (STMT_VINFO_STRIDED_P (stmt_info) && group_ncopies > 1
> 	   ? VMAT_STRIDED_SLP
> 	   : VMAT_CONTIGUOUS);
>       if (DR_IS_WRITE (STMT_VINFO_DATA_REF (stmt_info)))


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]