This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RFD: rtx_cost changes


Our rtx_costs infrastructure is somewhat inadequate.  Consider
reload_cse_move2add, which converts constant loads into add
instructions.  We have the following problems:

1. There's no way to distinguish between a 3-operand operation and a
   2-operand operation in rtx_costs.  On Thumb-2,
   movs r2, #100
   adds r2, r2, #8
   adds r2, r3, #7
   are 2-byte instructions, while
   movs r8, #100
   movw r2, #3095
   add r2, r2, #1234
   add r2, r3, #8
   take four bytes.  If we only have (plus (r3) (const_int 8)) or
   (const_int 100), we cannot meaningfully return a cost for it.  We
   need to know the SET_DEST.
2. When optimizing for speed, we should prefer faster instructions, but
   when choosing between equally fast alternatives, we should pick the
   smaller one.  The same applies vice versa when optimizing for size.
3. We have no way of estimating costs for a strict_low_part load of a
   constant (one optimization done in reload_cse_move2add).
4. rtx_costs returns zero for REG, even when outer_code == SET.  This is
   almost certainly wrong everywhere; COSTS_N_INSNS (1) seems like a
   better choice.  This isn't addressed yet in the patch, as trying to
   change it introduces lots of changes in code generation.
5. In postreload, we have apples-to-oranges comparisons of the form
   rtx_cost (x, PLUS) < rtx_cost (y, SET)
   which makes no sense.  A constant may well be free inside a PLUS,
   but the insn still has a cost which we'd underestimate in such a
   case.  We should compare both in terms of outer_code == SET.
6. On some machines, using constants is better than using registers.
   reload_cse_simplify_operands doesn't have a plausible cost check
   when substituting constants, and reload_cse_move2add should also
   do the substitution only when costs improve.  It's likely that
   further optimization passes can more easily deal with a constant
   load rather than an add, and on OOO machines it may reduce
   dependencies between instructions and reduce the number of
   register accesses in a cycle (documented as a limitation of PPro
   and its derivatives such as Core 2).

The patch below extends the rtx_costs interface by adding a SET_LHS
argument.  I've left the old rtx_costs available as a macro; the new one
is called rtx_costs2, and there's a new target macro as well.  This
allows for a more gradual transition, converting ports one by one.  Some
re-tuning is probably needed to take full advantage of the new
possibilities.

There's a new interface to retrieve full costs, i.e. both speed and size
costs, and to compare them.  I've converted parts of postreload.c to use
it.  I've also changed it to only perform move2add if the costs are
better -

Thumb-2 results:

Prefer two-byte add over four-byte move
-       mov     r2, #256
+       adds    r2, r2, #2
====
Prefer two-byte move over two-byte add
-       adds    r1, r1, #1
+       movs    r1, #5
====
Prefer four-byte move over four-byte add
-       add     r3, r3, #4080
+       mov     r3, #4080

For the moment, I'm only posting this to request comments.  Does the new
interface look reasonable?  Anything else we should change or add (e.g.
operand number to distinguish SET_SRC from SET_DEST, or nesting level
from SET)?  Should we provide some kind of "cost modifier" and allow a
port to e.g. specify "prefer this one in case of equal costs" or "when
optimizing for speed, prefer this one in case of equal speed costs,
ignoring size"?  This could be useful e.g. to break ties between
registers and constants when both have speed cost 0.

Maybe we ought to use insn_rtx_cost more, and introduce a "cost" attribute?


Bernd

Attachment: costs3.diff
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]