This is the mail archive of the
mailing list for the GCC project.
Re: Best way to compute cost of a sequence of gimple stmt
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: "Thomas Preud'homme" <thomas dot preudhomme at arm dot com>
- Cc: GCC Development <gcc at gcc dot gnu dot org>
- Date: Wed, 11 Jun 2014 10:09:11 +0200
- Subject: Re: Best way to compute cost of a sequence of gimple stmt
- Authentication-results: sourceware.org; auth=none
- References: <003901cf8486$8e71a3d0$ab54eb70$ at arm dot com> <CAFiYyc1Rvt9UtGUtqUzxq4ww7F96WV-354adZX5EknTArdbhCg at mail dot gmail dot com> <000301cf8538$63e07170$2ba15450$ at arm dot com>
On Wed, Jun 11, 2014 at 7:45 AM, Thomas Preud'homme
>> From: Richard Biener [mailto:firstname.lastname@example.org]
>> Sent: Tuesday, June 10, 2014 5:16 PM
>> In general this is impossible to do. I don't have a good answer on
>> how to determine whether (unaligned) load + bswap is faster than
>> doing sth else - but there is a very good chance that the original
>> code is even worse. For the unaligned load you can expect
>> an optimal code sequence to be generated - likewise for the bswap.
>> Now - if you want to do the best for the combination of both I'd
>> say you add support to the expr.c bitfield extraction code to do
>> the bswap on-the-fly and use TER to see that you are doing the
>> bswap on a memory source.
> Oh I see. Doing it there would mean instead of two independent
> operations you'd do the best combination possible, is that right?
Yes (but probably it's not worth the trouble).
>> There is only two choices - disable unaligned-load + bswap on
>> SLOW_UNALIGNED_ACCESS targets or not. Doing sth more
>> fancy won't do the trick and isn't worth the trouble IMHO.
> There is some other reason to compute the cost that I didn't
> mention. For instance, you suggested to recognize partial
> load (+bswap). Quoting you:
>> unsigned foo (unsigned char *x)
>> return x << 24 | x << 8 | x;
>> ? We could do an unsigned int load from x and zero byte 3
>> with an AND.
> Even with aligned access, the above might be slower if x was
> already loaded previously and sits in a register.
Well, I think the pattern detection makes sure that it only follows
single-use chains? Or rather that all original loads and bit
operations are dead after the transform (not exactly following
single-use chains if you'd consider to eventually handle
x << 24 | x << 16 | x << 8 | x as a valid permutation)?
> I'm tempted to use a simple heuristic such as comparing the
> number of loads before and after, adding one if the load is
> unaligned. So in the above example, supposing that there is
> some computation done around x before the return line,
> we'd have 2 loads before Vs 2 x is unaligned and we would
> cancel the optimization. If x is aligned the optimization would
> Do you thing this approach is also too much trouble or would
> not work?
I'm not sure. For noop-loads I'd keep them unconditionally, even if
unaligned. I'd disable unaligned-load + bswap for now. People
interested and sitting on such a target should do the measurements
and decide if it's worth the trouble (is arm affected?).
But I see that the code currently does not limit itself to single-use
chains and thus may end up keeping the whole original code life
by unrelated uses. So a good thing would be to impose proper
restrictions here. For example, in find_bswap_or_nop_1 do
if (TREE_CODE (rhs1) != SSA_NAME
|| !has_single_use (rhs1))
> Best regards,