[4.0 and mainline] Fix multiplication by constant expansion

Mon Jan 2 01:30:00 GMT 2006

Hi Jan,

On Mon, 2 Jan 2006, Jan Hubicka wrote:
> Does this patch look acceptable?  (I can also do just the "-1" trick and
> skip the DECOMPSE_LEA if it seems riskant for you, especially in 4.1)

The change that Richard Henderson has been after for some time, but
I've never got around to changing, is to use COSTS_N_INSNS in the costs
tables themselves, as is done for the new parameterized ports, such
as rs6000, sparc, mips, etc...

i.e. something like:

static const
struct processor_costs athlon_cost = {
  COSTS_N_INSNS (1),                    /* cost of an add instruction */
  COSTS_N_INSNS (2),                    /* cost of a lea instruction */
  COSTS_N_INSNS (1) + 1,                /* variable shift costs */
  COSTS_N_INSNS (1) + 1,                /* constant shift costs */
  /* A mild preference for addition over shift, and that leas is
     slightly cheaper than a shift followed by an addition.  */
...

Then in ix86_rtx_costs use:
	*total = ix86_cost->lea;
instead of
	*total = COSTS_N_INSNS (ix86_cost->lea);

This design has several advantages.

It improves performance as the COSTS_N_INSNS multiplication is
performed at compile-time, not run-time, but also provides more
flexibility to the backend, allowing for fine resolution tweaks.
Thirdly, it also fixes an issue with -Os on x86, where the size_cost
table is in bytes, and where an addition is two bytes, which conflicts
with the middle-end's definition that COSTS_N_INSNS (1) is approximately
rtx_cost (PLUS (reg) (reg)).  This new scheme allows the x86 backend
to define a local

#define COSTS_N_BYTES (x) (2*(x))

static const
struct processor_costs size_cost = {    /* costs for tunning for size */
  COSTS_N_BYTES (2),                    /* cost of an add instruction */
  COSTS_N_BYTES (3),                    /* cost of a lea instruction */
  COSTS_N_BYTES (2),                    /* variable shift costs */
  COSTS_N_BYTES (3),                    /* constant shift costs */
...

I think this is a better longer term solution than your proposal of
introducing a new lea_cost function to the x86 backend.  It would
also allow use to specify that the cost of a subtraction is slightly
more than the cost of an addition, "COSTS_N_INSNS (1) + 1", to
implement your suggestion that we should prefer PLUS over MINUS on
targets where the latter can't be optimized into address arithmetic.
In the middle-end, synth_mult is already set up to handle this by
tracking different values for add_cost and sub_cost.  The issue
again is that on x86 we use the same value, when there is a minor
preference for one over the other.  This is a target parameterization
issue as other targets can use subtractions in addressing modes, or
provide efficient shiftsub instructions.

The good news is I think we're converging on a solution.  If we need
to get a fix into 4.0 quickly, backporting a subset of the above change
where only the lea field is already in COSTS_N_INSNS would make for a
very safe and trivial tweak for the release branch.  The major change
would be mostly just the following in athlon_cost.

  COSTS_N_INSNS (2) - 1,  /* cost of lea instruction */

Many thanks for bearing with me.  As I already know this is a change
RTH has been after, I'm happy to use my middle-end uber back-end
authority to approve these changes to i386.c, so you needn't worry
that the the correct solution will take significantly longer to
review.

Thanks again,

Roger
--