This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Fixing inconsistent uses of address costs



On 30/03/15 08:07, Bin.Cheng wrote:
On Fri, Mar 27, 2015 at 5:43 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
On 27/03/15 03:29, Bin.Cheng wrote:
On Thu, Mar 26, 2015 at 11:40 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com>
wrote:
Hi all,

I'd like to attempt to make GCC's usage of costs in the backends
consistent.
We have a lot of different types: rtx costs, address costs, regmove
costs,
vector costs etc. Some of them are use in different units, compared
against
different types of costs and in general are a bit of a mess. For now I'd
like
to clean up address costs since they are actually not used as much as
some
other costs and seem to me to be a slightly simpler task.

  From what I can see, address costs i.e. the TARGET_ADDRESS_COST hook are
used
in 5 places overall:
* fwprop.c: forward propagation tries to propagate an rtx into an address
and compare the address cost of the old address and the one it wants to
propagate into it and picks the most profitable.

* postreload.c: Again, tries to replace an address with a new one and
compares
the two address costs and picks the most profitable.

* tree-ssa-loop-ivopts.c: A bit more involved here. From what I can tell
it
is
used to assign a cost to various addressing modes, but ends up comparing
but
in the computation_cost function adds/mixes the address cost with rtx
costs.

* Some targets use address costs as part of their calculation of rtx
costs
for, say, a memory access instruction.

* The final and worst part is in loop-invariant.c in the
create_new_invariant
function that needs to get a feel for how 'cheap' an address is and for
that
it compares the address cost to a magic number 3. See the thread that
introduced it at http://gcc.gnu.org/ml/gcc-patches/2009-10/msg01210.html.

There are a few issues wiht the above usages:
- The magic number in loop-invariant.c was picked to help bwaves on x86
and
makes no analytical sense for any other target.

- Some targets use COSTS_N_INSNS units for their address costs, others
use
normal scalar units (every target that doesn't define the address costs
hook
ends up going through rtx costs which use COSTS_N_INSNS).
I think this rather undermines the usage in tree-ssa-loop-ivopts.c which
uses
rtx costs in COSTS_N_INSNS and address costs interchangeably.

An insight here is that address costs have two types of usage:
1. To compare two addresses and choose whether to replace one by the
other
(as in fwprop and postreload)
2. To give a metric of cheapness for use in loop optimisations.

So I propose to split the two uses up. For the fwprop and postreload case
introduce a hook that will take two addresses 'x' and 'y' and determine
whether to substitute 'x' by 'y'. This can use address costs and allow
the
backend to do as deep comparison between two addresses as it could
possibly
want, unconstrained by the requirement to assign just a single number to
each one individually and having the midend just pick the smallest
number.
We can call it something like TARGET_ADDRESS_PREFERABLE_P or something
like
that. Then get fwprop and postreload to use that to determine whether to
substitute one address for another.

What to do with the remaining address costs hook and its uses in
tree-ssa-loop-ivopts and loop-invariant? Since tree-ssa-loop-ivopts uses
the address costs interchangeably with rtx costs I think we should at
least
convert all address costs to use the same units: COSTS_N_INSNS. This
raises
a question: could we not just use rtx costs for the addresses? Someone
with
more familiarity with the code can correct me, but is the address cost
usage
in that file supposed to reflect the cost of computing the address
outside
of
an addressing mode? If so, then shouldn't rtx costs be used? Conversely,
if
address costs are a measure of something different, surely we shouldn't
be
adding or comparing them with rtx costs, but rather create some
two-dimensional
structure to compare two rtx/address entities? (similarly to the
rtx cost/latency calculations in expmed for the mult synthesis code).

Then there's the usage in loop-invariant.c where the address cost number
is
compared with '3'. This is really not portable across targets. The code
wants
some measure of whether an address is 'cheap'. There could be various
approaches on how to provide that.
Could we enumerate across an rtx array of representatives of
all possible addresses for a target sorted by the
TARGET_ADDRESS_PREFERABLE_P
hook above? Should we be asking the target to tell us which address type
is
cheap? In any case, it seems to me that the cheapest address in a target
is
something that only the target can tell, unless we agree on a universal
meaning and unit of measurement for address costs.

What do people think? Would this approach be worth pursuing if the above
questions can be answered? From having a quick look at config/ I think
that
converting the targets to use COSTS_N_INSNS units would not be a
controversial
task as long as the midend usage of address costs is consistent.
I totally agree that use of address cost should be cleaned up.  My
point is the use is inconsistent on RTL level, maybe even worse than
midend.  Look at arm_arm_address_cost, it defines meaningless costs
regarding to both rtx cost and address cost.  It makes up to RTL
passes like fprop_addr (maybe postreload too) thus these passes can
propagate as many operations into address expressions and save alu
instructions (fprop_addr pass may have other more important uses,
please correct me here).
Hi Bin,

fwprop does try to propagate rtxes into addresses and it has a heuristic by
which
it checks the address costs. If one is clearly cheaper than the other
according
to these costs it picks that. If the costs are equal it picks the one with
the *highest*
rtx cost in the hope that it will help eliminate more costly expressions.
I agree that the arm address costs are just a hack that were just picked to
get the
least bad behaviour from fwprop and postreload.
I'm proposing to make a hook that takes two addresses and returns the
preferable
one. That way, the backend doesn't have to assign numbers to addresses and
can do
as deep relative analysis as it wants.
As far as fwprop itself is concerned, IMHO the problem is with how the
address cost interface is used, rather than the how the interface is
implemented.

What would you prefer it to do? Currently it tries to propagate an rtx into
an address, compares the before and after address costs. How should it decide
on whether to perform the substitution? Just look at the rtx costs?


    IMHO, this does the right optimization but
with wrong method, and it has falls out too.  For example, below code
transformation is reported not only once before:

      add  r5, r6, r7
      ldr r8, [r5]
      ldr r8, [r5, 4]

changed into:

      add  r5, r6, r7
      ldr r8, [r6, r7]
      ldr r8, [r5, 4]

In this case, it corrupts load-pair opportunity and doesn't give any
benefit.  Save problem holds for anto-inc-dec optimization.

So, here fprop_addr is doing context-sensitive optimization with
respect to address cost, which isn't always right.  I once tried to
re-implement the pass so that it can firstly checks saved rtx costs
and then the address costs benefit.  I believe others also tried the
same thing with different case.

I had looked at this as well. I was trying to get fwprop to propagate a
sign-extend
into the addressing mode in aarch64 and hit something similar.

    Anyway, it never concluded because
there is no obvious performance change.  One or two alu instructions
saved doesn't matter that much on advanced processors, not mention it
might disturb combiner.

I think it would be worth pursuing. My first stab at this in fwprop would be
to
pick the preferable address using the TARGET_ADDRESS_PREFERABLE_P hook
above.
The problem comes in replicating the current behaviour. If the hook just
returns
true/false, then we lose the heuristic we currently use when the addresses
are of
equal cost. A possible way around this that I'm  experimenting with is to
make this
new hook return {-1,0,1} rather than a boolean with -1,+1 representing a
definite
preference and 0 having no preference. That way fwprop can tell when there's
no
clear preference and pick the more expensive one rtx-costs-wise.

As for tree ivopts, address cost is used in both ways.  For any
address computation that's invalid, it tries to legitimize it into two
parts, the first part results in alu instructions, the second part
results in address expression of different addressing modes.  Right
now the rtx cost (for the first part) and the address cost (for the
second part) are accumulated and compared altogether.  I do like the
idea split costs into different types because all elements are
entangled in single cost model, and it's hard to tune for specific
case.

Thanks for explaining.
I think it would be possible to make the comparisons there a bit more sane.
If an address computation is split into alu instructions and a legitimate
address
then carry around the rtx cost of the alu part and the address. When the
time
comes to compare two computations, we create a more involved way of
comparing.
For example (would need benchmarking, of course):
* Compare the rtx costs of the alu components and check the address
preference for
the legitimate address components using the new hook.
* If the alu part costs are of equal rtx cost, pick the one which has the
preferable legitimate address.
* If expression 'a' has a more expensive alu component than 'b' but a more
preferable address component, then use some tie breaker. Perhaps apply rtx
costs
on the address expression and compare those...


I think that if we fixed up the usage in tree-ssa-loop-ivopts.c the way
described above,
and figure out what to do for the 'cheap address' heuristic in
loop-invariant.c, we could
eliminate the use of address costs altogether. Or at least make them a
backend-specific measure that each backend can use in the process of
determining a preferable address but would not have to be mixed with rtx
costs or magic
constants in the midend.
Forgot to mention.  Just like Sandra commented, I think what really
important is how legitimize_address works and how IVOPT mimicries the
legitimization behavior and generates the final code, rather cost for
different addressing modes.  Actually aarch64 and some other targets
return exactly the same cost (0) for all addressing modes.  And I
think it really doesn't matter very much.


Hmmm... so for aarch64 at least it seems that address costs
 are irrelevant in the ivopt pass. The problems in the ivopt pass
are rather down to the way it tries to legitimize the address.
Can we just remove the address costs usage from there?
Are thereany targets on which address costs make a meaningful, predictable
difference in what ivopt does?

I'm hoping we can remove the address costs from the midend entirely,
to discourage uses such as in ivopt and loop-invariant where code compares
the address cost number to a magic constant, tuned for a single benchmark on
a single target because it needs to determine whether an address is 'cheap'.

If we don't use address costs in fwprop and ivopts, then the only legitimate
usage seems to be in postreload.c, which can surely be better serviced by a
more targetted hook that takes two addresses and picks the better one for
that particular purpose.

Thanks,
Kyrill


Thanks,
bin
Thanks,
Kyrill


Please correct me if I was wrong.

Thanks,
bin



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]