[PATCH] Reduce cost of Athlon multiplication sequences

Fri Jan 6 07:41:00 GMT 2006

Hi,
we've dicussed this with Roger on IRC yesterday (Roger, please correct
me if I am wrong in something) and it looks like we need no tweak
synth_mult first to be more cureful about number of temporaries it
introduce (the bad multiply by 11 sequence requires 3 registers, while
same is doable with 2 registers.  The sequence is also bad for 2 address
machine getting extra mov on critical path but perhaps addressing the
first would cure this too) Before doing so it seems to be bad idea to
biass the costs as I proposed in the other patch as that would only make
synth_mult to give up the sequence of latency of 3.

This should leave us with the TARGET_DECOMPOSE_LEA issues.  There seems
to be bit confussion about what it is meant for.  I originally added it
for PPro architecture that translate lea back to add/shift instructions.
(ie lea representing reg*2 has latency 1, lea representing reg*2+reg has
latency 2 and so on).  Same behaviour has P4 architecture.

It was never meant to force LEA being directly translated back to
primitive ops explicitely like partly done by Roger's patch (only tweak
costs to be realistic) This behaviour of TARGET_DECOMPOSE_LEA was
removed in the past because of bug in implementation.  I see 3 options
here:

  1) Remove TARGET_DECOMPOSE_LEA and leave us with some fixed cost for
  LEA (probably 2).  At the moment we never consider rtx_cost for other
  leas than add+shift/add+add sequences so this would do resonable job.
  (on P4 add+shift is more expensive than add+add)
  2) Extend x86_costs to represent all variants of lea
  (shift/add/add_add/shift_add/shift_add_add) 
  3) Add the behaviour back (I did so in one of earlier versions of my
  costs path via lea_cost function called from rtx_costs).  Basically
  the x86_costs->lea is ignored and cost is always computed as sum of
  underlying ops.
  We might probably also rename TARGET_DECOMPOSE_LEA to
  TARGET_DECOMPOSED_LEA_LATENCY or similar to spread less confusion.

What would be preferred solution here?
Honza