[4.0 and mainline] Fix multiplication by constant expansion

Mon Jan 2 01:49:00 GMT 2006

> 
> Hi Jan,
> 
> On Mon, 2 Jan 2006, Jan Hubicka wrote:
> > Does this patch look acceptable?  (I can also do just the "-1" trick and
> > skip the DECOMPSE_LEA if it seems riskant for you, especially in 4.1)
> 
> The change that Richard Henderson has been after for some time, but
> I've never got around to changing, is to use COSTS_N_INSNS in the costs
> tables themselves, as is done for the new parameterized ports, such
> as rs6000, sparc, mips, etc...
> 
> i.e. something like:
> 
> static const
> struct processor_costs athlon_cost = {
>   COSTS_N_INSNS (1),                    /* cost of an add instruction */
>   COSTS_N_INSNS (2),                    /* cost of a lea instruction */
>   COSTS_N_INSNS (1) + 1,                /* variable shift costs */
>   COSTS_N_INSNS (1) + 1,                /* constant shift costs */
>   /* A mild preference for addition over shift, and that leas is
>      slightly cheaper than a shift followed by an addition.  */
> ...

This looks nice ineed.  I will send patch to turn i386 into this scheme.

> I think this is a better longer term solution than your proposal of
> introducing a new lea_cost function to the x86 backend.  It would

lea_cost function would be however still neccesary for the
TARGET_DECOMPSE_LEA.
My patch also reduce the cost of lea iff it captures more than one
atomic operation, but I guess this is not big deal to reduce it always.
> also allow use to specify that the cost of a subtraction is slightly
> more than the cost of an addition, "COSTS_N_INSNS (1) + 1", to

I will experiment with this too.  It looks like good idea.  However
there would be still problem with 2-address machines (for simplicity
imagine i386 without lea and no superscalar code).  On such machines the
latency heuristic will chose more dificult to register allocate
sequences over easier to register allocate.  This can't be really
captured by rtx_cost.

Perhaps the cost rules within multiply by constant algorithm that
provably require an copy of value to be made can have cost and/or
latency increased by move_cost?

Yet another issue I have with alg_add_factor is that latency is always
based on add_cost.  On P4 for instance the shifts are significantly more
expensive than adds.  To base the cost on maximum from shifting and
adding would expose the fact that shifting by 2 is a lot cheaper because
it can be done by fast adder.

I will benchmark both variants - the variant just with lea change and
other variant with latency disabled too.  I tried that last week with my
lea patch and it seems like latency is still doing some harm for k8, but
lets see if it reproduce.
> Thanks again,

Thanks too and please accept my apology if the original replies seemed
rude/dense.  I am somewhat sleepy :(

Honza
> 
> Roger
> --
>