This is the mail archive of the
mailing list for the GCC project.
Re: Fixing inconsistent uses of address costs
- From: "Bin.Cheng" <amker dot cheng at gmail dot com>
- To: Kyrill Tkachov <kyrylo dot tkachov at arm dot com>
- Cc: "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>
- Date: Fri, 27 Mar 2015 11:29:03 +0800
- Subject: Re: Fixing inconsistent uses of address costs
- Authentication-results: sourceware.org; auth=none
- References: <5514286D dot 7080907 at arm dot com>
On Thu, Mar 26, 2015 at 11:40 PM, Kyrill Tkachov <firstname.lastname@example.org> wrote:
> Hi all,
> I'd like to attempt to make GCC's usage of costs in the backends consistent.
> We have a lot of different types: rtx costs, address costs, regmove costs,
> vector costs etc. Some of them are use in different units, compared against
> different types of costs and in general are a bit of a mess. For now I'd
> to clean up address costs since they are actually not used as much as some
> other costs and seem to me to be a slightly simpler task.
> From what I can see, address costs i.e. the TARGET_ADDRESS_COST hook are
> in 5 places overall:
> * fwprop.c: forward propagation tries to propagate an rtx into an address
> and compare the address cost of the old address and the one it wants to
> propagate into it and picks the most profitable.
> * postreload.c: Again, tries to replace an address with a new one and
> the two address costs and picks the most profitable.
> * tree-ssa-loop-ivopts.c: A bit more involved here. From what I can tell it
> used to assign a cost to various addressing modes, but ends up comparing but
> in the computation_cost function adds/mixes the address cost with rtx costs.
> * Some targets use address costs as part of their calculation of rtx costs
> for, say, a memory access instruction.
> * The final and worst part is in loop-invariant.c in the
> function that needs to get a feel for how 'cheap' an address is and for that
> it compares the address cost to a magic number 3. See the thread that
> introduced it at http://gcc.gnu.org/ml/gcc-patches/2009-10/msg01210.html.
> There are a few issues wiht the above usages:
> - The magic number in loop-invariant.c was picked to help bwaves on x86 and
> makes no analytical sense for any other target.
> - Some targets use COSTS_N_INSNS units for their address costs, others use
> normal scalar units (every target that doesn't define the address costs hook
> ends up going through rtx costs which use COSTS_N_INSNS).
> I think this rather undermines the usage in tree-ssa-loop-ivopts.c which
> rtx costs in COSTS_N_INSNS and address costs interchangeably.
> An insight here is that address costs have two types of usage:
> 1. To compare two addresses and choose whether to replace one by the other
> (as in fwprop and postreload)
> 2. To give a metric of cheapness for use in loop optimisations.
> So I propose to split the two uses up. For the fwprop and postreload case
> introduce a hook that will take two addresses 'x' and 'y' and determine
> whether to substitute 'x' by 'y'. This can use address costs and allow the
> backend to do as deep comparison between two addresses as it could possibly
> want, unconstrained by the requirement to assign just a single number to
> each one individually and having the midend just pick the smallest number.
> We can call it something like TARGET_ADDRESS_PREFERABLE_P or something like
> that. Then get fwprop and postreload to use that to determine whether to
> substitute one address for another.
> What to do with the remaining address costs hook and its uses in
> tree-ssa-loop-ivopts and loop-invariant? Since tree-ssa-loop-ivopts uses
> the address costs interchangeably with rtx costs I think we should at least
> convert all address costs to use the same units: COSTS_N_INSNS. This raises
> a question: could we not just use rtx costs for the addresses? Someone with
> more familiarity with the code can correct me, but is the address cost usage
> in that file supposed to reflect the cost of computing the address outside
> an addressing mode? If so, then shouldn't rtx costs be used? Conversely, if
> address costs are a measure of something different, surely we shouldn't be
> adding or comparing them with rtx costs, but rather create some
> structure to compare two rtx/address entities? (similarly to the
> rtx cost/latency calculations in expmed for the mult synthesis code).
> Then there's the usage in loop-invariant.c where the address cost number is
> compared with '3'. This is really not portable across targets. The code
> some measure of whether an address is 'cheap'. There could be various
> approaches on how to provide that.
> Could we enumerate across an rtx array of representatives of
> all possible addresses for a target sorted by the
> hook above? Should we be asking the target to tell us which address type is
> cheap? In any case, it seems to me that the cheapest address in a target is
> something that only the target can tell, unless we agree on a universal
> meaning and unit of measurement for address costs.
> What do people think? Would this approach be worth pursuing if the above
> questions can be answered? From having a quick look at config/ I think that
> converting the targets to use COSTS_N_INSNS units would not be a
> task as long as the midend usage of address costs is consistent.
I totally agree that use of address cost should be cleaned up. My
point is the use is inconsistent on RTL level, maybe even worse than
midend. Look at arm_arm_address_cost, it defines meaningless costs
regarding to both rtx cost and address cost. It makes up to RTL
passes like fprop_addr (maybe postreload too) thus these passes can
propagate as many operations into address expressions and save alu
instructions (fprop_addr pass may have other more important uses,
please correct me here). IMHO, this does the right optimization but
with wrong method, and it has falls out too. For example, below code
transformation is reported not only once before:
add r5, r6, r7
ldr r8, [r5]
ldr r8, [r5, 4]
add r5, r6, r7
ldr r8, [r6, r7]
ldr r8, [r5, 4]
In this case, it corrupts load-pair opportunity and doesn't give any
benefit. Save problem holds for anto-inc-dec optimization.
So, here fprop_addr is doing context-sensitive optimization with
respect to address cost, which isn't always right. I once tried to
re-implement the pass so that it can firstly checks saved rtx costs
and then the address costs benefit. I believe others also tried the
same thing with different case. Anyway, it never concluded because
there is no obvious performance change. One or two alu instructions
saved doesn't matter that much on advanced processors, not mention it
might disturb combiner.
As for tree ivopts, address cost is used in both ways. For any
address computation that's invalid, it tries to legitimize it into two
parts, the first part results in alu instructions, the second part
results in address expression of different addressing modes. Right
now the rtx cost (for the first part) and the address cost (for the
second part) are accumulated and compared altogether. I do like the
idea split costs into different types because all elements are
entangled in single cost model, and it's hard to tune for specific
Please correct me if I was wrong.