[PATCH v2] rs6000: Modify the way for extra penalized cost
Segher Boessenkool
segher@kernel.crashing.org
Mon Nov 29 22:06:08 GMT 2021
Hi!
On Tue, Sep 28, 2021 at 04:16:04PM +0800, Kewen.Lin wrote:
> This patch follows the discussions here[1][2], where Segher
> pointed out the existing way to guard the extra penalized
> cost for strided/elementwise loads with a magic bound does
> not scale.
>
> The way with nunits * stmt_cost can get one much
> exaggerated penalized cost, such as: for V16QI on P8, it's
> 16 * 20 = 320, that's why we need one bound. To make it
> better and more readable, the penalized cost is simplified
> as:
>
> unsigned adjusted_cost = (nunits == 2) ? 2 : 1;
> unsigned extra_cost = nunits * adjusted_cost;
> For V2DI/V2DF, it uses 2 penalized cost for each scalar load
> while for the other modes, it uses 1.
So for V2D[IF] we get 4, for V4S[IF] we get 4, for V8HI it's 8, and
for V16QI it is 16? Pretty terrible as well, heh (I would expect all
vector ops to be similar cost).
> It's mainly concluded
> from the performance evaluations. One thing might be
> related is that: More units vector gets constructed, more
> instructions are used.
Yes, but how often does that happen, compared to actual vector ops?
This also suggests we should cost vector construction separately, which
would pretty obviously be a good thing anyway (it happens often, it has
a quite different cost structure).
> It has more chances to schedule them
> better (even run in parallelly when enough available units
> at that time), so it seems reasonable not to penalize more
> for them.
Yes.
> + /* Don't expect strided/elementwise loads for just 1 nunit. */
"We don't expect" etc.
Okay for trunk. Thanks! This probably isn't the last word in this
story, but it is an improvement in any case :-)
Segher
More information about the Gcc-patches
mailing list