This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Best way to compute cost of a sequence of gimple stmt


On Tue, Jun 10, 2014 at 10:32 AM, Thomas Preud'homme
<thomas.preudhomme@arm.com> wrote:
> Hi there,
>
> With recent changes to it, the bswap pass can now replace a series of
> (probably aligned) load + bitwise operation (AND, OR and shifts) + casts
> by a (potentially unaligned) load and a bswap. I was rightfully pointed
> out that this might be more expensive than the original sequence of
> gimple statements. Therefore I am trying to compute the cost of the
> sequence with and without the transformation to make an informed
> decision.
>
> So far I proceeded by reusing the computation_cost function from
> ivopts and various functions from expmed (shift_cost, convert_cost
> and some new ones: rot_cost for instance). However, this doesn't
> allow me to compute the cost of a function call (the call to the bswap
> builtin) and I am lurking towards exposing expand_gimple_stmt () in
> a new function gimple_stmt_cost (). I am wondering though if it is a
> correct thing to do as I am not familiar with how expansion operates.
> I am also wondering if I should use gimple_stmt_cost as seldomly as
> possible or on the contrary make use of it for all statements so as to
> get rid of the modifications in ivopts and expmed.
>
> I'd appreciate any advices on how to compute the cost of a sequence
> of gimple statements.

In general this is impossible to do.  I don't have a good answer on
how to determine whether (unaligned) load + bswap is faster than
doing sth else - but there is a very good chance that the original
code is even worse.  For the unaligned load you can expect
an optimal code sequence to be generated - likewise for the bswap.
Now - if you want to do the best for the combination of both I'd
say you add support to the expr.c bitfield extraction code to do
the bswap on-the-fly and use TER to see that you are doing the
bswap on a memory source.

Anyway, what you'd really need to do is compare the original
code against the transform where on GIMPLE it's very-many-stmts
vs. two-stmts, and thus "obviously faster".

There is only two choices - disable unaligned-load + bswap on
SLOW_UNALIGNED_ACCESS targets or not.  Doing sth more
fancy won't do the trick and isn't worth the trouble IMHO.

Richard.

> Best regards,
>
> Thomas Preud'homme
>
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]