This is the mail archive of the
mailing list for the GCC project.
Re: Best way to compute cost of a sequence of gimple stmt
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: "Thomas Preud'homme" <thomas dot preudhomme at arm dot com>
- Cc: GCC Development <gcc at gcc dot gnu dot org>
- Date: Tue, 10 Jun 2014 11:15:50 +0200
- Subject: Re: Best way to compute cost of a sequence of gimple stmt
- Authentication-results: sourceware.org; auth=none
- References: <003901cf8486$8e71a3d0$ab54eb70$ at arm dot com>
On Tue, Jun 10, 2014 at 10:32 AM, Thomas Preud'homme
> Hi there,
> With recent changes to it, the bswap pass can now replace a series of
> (probably aligned) load + bitwise operation (AND, OR and shifts) + casts
> by a (potentially unaligned) load and a bswap. I was rightfully pointed
> out that this might be more expensive than the original sequence of
> gimple statements. Therefore I am trying to compute the cost of the
> sequence with and without the transformation to make an informed
> So far I proceeded by reusing the computation_cost function from
> ivopts and various functions from expmed (shift_cost, convert_cost
> and some new ones: rot_cost for instance). However, this doesn't
> allow me to compute the cost of a function call (the call to the bswap
> builtin) and I am lurking towards exposing expand_gimple_stmt () in
> a new function gimple_stmt_cost (). I am wondering though if it is a
> correct thing to do as I am not familiar with how expansion operates.
> I am also wondering if I should use gimple_stmt_cost as seldomly as
> possible or on the contrary make use of it for all statements so as to
> get rid of the modifications in ivopts and expmed.
> I'd appreciate any advices on how to compute the cost of a sequence
> of gimple statements.
In general this is impossible to do. I don't have a good answer on
how to determine whether (unaligned) load + bswap is faster than
doing sth else - but there is a very good chance that the original
code is even worse. For the unaligned load you can expect
an optimal code sequence to be generated - likewise for the bswap.
Now - if you want to do the best for the combination of both I'd
say you add support to the expr.c bitfield extraction code to do
the bswap on-the-fly and use TER to see that you are doing the
bswap on a memory source.
Anyway, what you'd really need to do is compare the original
code against the transform where on GIMPLE it's very-many-stmts
vs. two-stmts, and thus "obviously faster".
There is only two choices - disable unaligned-load + bswap on
SLOW_UNALIGNED_ACCESS targets or not. Doing sth more
fancy won't do the trick and isn't worth the trouble IMHO.
> Best regards,
> Thomas Preud'homme