[PATCH v2] rs6000: Modify the way for extra penalized cost

Kewen.Lin linkw@linux.ibm.com
Tue Nov 30 05:05:48 GMT 2021

Hi Segher,

on 2021/11/30 上午6:06, Segher Boessenkool wrote:
> Hi!
> On Tue, Sep 28, 2021 at 04:16:04PM +0800, Kewen.Lin wrote:
>> This patch follows the discussions here[1][2], where Segher
>> pointed out the existing way to guard the extra penalized
>> cost for strided/elementwise loads with a magic bound does
>> not scale.
>> The way with nunits * stmt_cost can get one much
>> exaggerated penalized cost, such as: for V16QI on P8, it's
>> 16 * 20 = 320, that's why we need one bound.  To make it
>> better and more readable, the penalized cost is simplified
>> as:
>>     unsigned adjusted_cost = (nunits == 2) ? 2 : 1;
>>     unsigned extra_cost = nunits * adjusted_cost;
>> For V2DI/V2DF, it uses 2 penalized cost for each scalar load
>> while for the other modes, it uses 1.
> So for V2D[IF] we get 4, for V4S[IF] we get 4, for V8HI it's 8, and
> for V16QI it is 16?  Pretty terrible as well, heh (I would expect all
> vector ops to be similar cost).

But for different vector units it has different number of loads, it seems
reasonable to have more costs when it has more loads to be fed into those
limited number of load/store units.

>> It's mainly concluded
>> from the performance evaluations.  One thing might be
>> related is that: More units vector gets constructed, more
>> instructions are used.
> Yes, but how often does that happen, compared to actual vector ops?
> This also suggests we should cost vector construction separately, which
> would pretty obviously be a good thing anyway (it happens often, it has
> a quite different cost structure).

vectorizer does model vector construction separately, there is an enum
vect_cost_for_stmt *vec_construct*, normally it works well.  But for this
bwaves hotspot, it requires us to do some more penalization as evaluated,
so we put the penalized cost onto this special vector construction when
some heuristic thresholds are met.

>> It has more chances to schedule them
>> better (even run in parallelly when enough available units
>> at that time), so it seems reasonable not to penalize more
>> for them.
> Yes.
>> +	  /* Don't expect strided/elementwise loads for just 1 nunit.  */
> "We don't expect" etc.


> Okay for trunk.  Thanks!  This probably isn't the last word in this
> story, but it is an improvement in any case :-)

Thanks for the review, rebased/retested and committed as r12-5589.

More information about the Gcc-patches mailing list