This is the mail archive of the
mailing list for the GCC project.
RE: Best way to compute cost of a sequence of gimple stmt
- From: "Thomas Preud'homme" <thomas dot preudhomme at arm dot com>
- To: "'Richard Biener'" <richard dot guenther at gmail dot com>
- Cc: "GCC Development" <gcc at gcc dot gnu dot org>
- Date: Wed, 11 Jun 2014 13:45:42 +0800
- Subject: RE: Best way to compute cost of a sequence of gimple stmt
- Authentication-results: sourceware.org; auth=none
- References: <003901cf8486$8e71a3d0$ab54eb70$ at arm dot com> <CAFiYyc1Rvt9UtGUtqUzxq4ww7F96WV-354adZX5EknTArdbhCg at mail dot gmail dot com>
> From: Richard Biener [mailto:email@example.com]
> Sent: Tuesday, June 10, 2014 5:16 PM
> In general this is impossible to do. I don't have a good answer on
> how to determine whether (unaligned) load + bswap is faster than
> doing sth else - but there is a very good chance that the original
> code is even worse. For the unaligned load you can expect
> an optimal code sequence to be generated - likewise for the bswap.
> Now - if you want to do the best for the combination of both I'd
> say you add support to the expr.c bitfield extraction code to do
> the bswap on-the-fly and use TER to see that you are doing the
> bswap on a memory source.
Oh I see. Doing it there would mean instead of two independent
operations you'd do the best combination possible, is that right?
> There is only two choices - disable unaligned-load + bswap on
> SLOW_UNALIGNED_ACCESS targets or not. Doing sth more
> fancy won't do the trick and isn't worth the trouble IMHO.
There is some other reason to compute the cost that I didn't
mention. For instance, you suggested to recognize partial
load (+bswap). Quoting you:
> unsigned foo (unsigned char *x)
> return x << 24 | x << 8 | x;
> ? We could do an unsigned int load from x and zero byte 3
> with an AND.
Even with aligned access, the above might be slower if x was
already loaded previously and sits in a register.
I'm tempted to use a simple heuristic such as comparing the
number of loads before and after, adding one if the load is
unaligned. So in the above example, supposing that there is
some computation done around x before the return line,
we'd have 2 loads before Vs 2 x is unaligned and we would
cancel the optimization. If x is aligned the optimization would
Do you thing this approach is also too much trouble or would