This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Best way to compute cost of a sequence of gimple stmt

On Wed, Jun 11, 2014 at 7:45 AM, Thomas Preud'homme
<> wrote:
>> From: Richard Biener []
>> Sent: Tuesday, June 10, 2014 5:16 PM
>> In general this is impossible to do.  I don't have a good answer on
>> how to determine whether (unaligned) load + bswap is faster than
>> doing sth else - but there is a very good chance that the original
>> code is even worse.  For the unaligned load you can expect
>> an optimal code sequence to be generated - likewise for the bswap.
>> Now - if you want to do the best for the combination of both I'd
>> say you add support to the expr.c bitfield extraction code to do
>> the bswap on-the-fly and use TER to see that you are doing the
>> bswap on a memory source.
> Oh I see. Doing it there would mean instead of two independent
> operations you'd do the best combination possible, is that right?

Yes (but probably it's not worth the trouble).

>> There is only two choices - disable unaligned-load + bswap on
>> SLOW_UNALIGNED_ACCESS targets or not.  Doing sth more
>> fancy won't do the trick and isn't worth the trouble IMHO.
> There is some other reason to compute the cost that I didn't
> mention. For instance, you suggested to recognize partial
> load (+bswap). Quoting you:
>> unsigned foo (unsigned char *x)
>> {
>>   return x[0] << 24 | x[2] << 8 | x[3];
>> }
>> ?  We could do an unsigned int load from x and zero byte 3
>> with an AND.
> Even with aligned access, the above might be slower if x[0] was
> already loaded previously and sits in a register.

Well, I think the pattern detection makes sure that it only follows
single-use chains?  Or rather that all original loads and bit
operations are dead after the transform (not exactly following
single-use chains if you'd consider to eventually handle
x[0] << 24 | x[0] << 16 | x[2] << 8 | x[3] as a valid permutation)?

> I'm tempted to use a simple heuristic such as comparing the
> number of loads before and after, adding one if the load is
> unaligned. So in the above example, supposing that there is
> some computation done around x[0] before the return line,
> we'd have 2 loads before Vs 2 x is unaligned and we would
> cancel the optimization. If x is aligned the optimization would
> proceed.
> Do you thing this approach is also too much trouble or would
> not work?

I'm not sure.  For noop-loads I'd keep them unconditionally, even if
unaligned.  I'd disable unaligned-load + bswap for now.  People
interested and sitting on such a target should do the measurements
and decide if it's worth the trouble (is arm affected?).

But I see that the code currently does not limit itself to single-use
chains and thus may end up keeping the whole original code life
by unrelated uses.  So a good thing would be to impose proper
restrictions here.  For example, in find_bswap_or_nop_1 do

  if (TREE_CODE (rhs1) != SSA_NAME
     || !has_single_use (rhs1))


> Best regards,
> Thomas

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]