This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On 30/03/15 08:07, Bin.Cheng wrote:
On Fri, Mar 27, 2015 at 5:43 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:On 27/03/15 03:29, Bin.Cheng wrote:On Thu, Mar 26, 2015 at 11:40 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:Hi all, I'd like to attempt to make GCC's usage of costs in the backends consistent. We have a lot of different types: rtx costs, address costs, regmove costs, vector costs etc. Some of them are use in different units, compared against different types of costs and in general are a bit of a mess. For now I'd like to clean up address costs since they are actually not used as much as some other costs and seem to me to be a slightly simpler task. From what I can see, address costs i.e. the TARGET_ADDRESS_COST hook are used in 5 places overall: * fwprop.c: forward propagation tries to propagate an rtx into an address and compare the address cost of the old address and the one it wants to propagate into it and picks the most profitable. * postreload.c: Again, tries to replace an address with a new one and compares the two address costs and picks the most profitable. * tree-ssa-loop-ivopts.c: A bit more involved here. From what I can tell it is used to assign a cost to various addressing modes, but ends up comparing but in the computation_cost function adds/mixes the address cost with rtx costs. * Some targets use address costs as part of their calculation of rtx costs for, say, a memory access instruction. * The final and worst part is in loop-invariant.c in the create_new_invariant function that needs to get a feel for how 'cheap' an address is and for that it compares the address cost to a magic number 3. See the thread that introduced it at http://gcc.gnu.org/ml/gcc-patches/2009-10/msg01210.html. There are a few issues wiht the above usages: - The magic number in loop-invariant.c was picked to help bwaves on x86 and makes no analytical sense for any other target. - Some targets use COSTS_N_INSNS units for their address costs, others use normal scalar units (every target that doesn't define the address costs hook ends up going through rtx costs which use COSTS_N_INSNS). I think this rather undermines the usage in tree-ssa-loop-ivopts.c which uses rtx costs in COSTS_N_INSNS and address costs interchangeably. An insight here is that address costs have two types of usage: 1. To compare two addresses and choose whether to replace one by the other (as in fwprop and postreload) 2. To give a metric of cheapness for use in loop optimisations. So I propose to split the two uses up. For the fwprop and postreload case introduce a hook that will take two addresses 'x' and 'y' and determine whether to substitute 'x' by 'y'. This can use address costs and allow the backend to do as deep comparison between two addresses as it could possibly want, unconstrained by the requirement to assign just a single number to each one individually and having the midend just pick the smallest number. We can call it something like TARGET_ADDRESS_PREFERABLE_P or something like that. Then get fwprop and postreload to use that to determine whether to substitute one address for another. What to do with the remaining address costs hook and its uses in tree-ssa-loop-ivopts and loop-invariant? Since tree-ssa-loop-ivopts uses the address costs interchangeably with rtx costs I think we should at least convert all address costs to use the same units: COSTS_N_INSNS. This raises a question: could we not just use rtx costs for the addresses? Someone with more familiarity with the code can correct me, but is the address cost usage in that file supposed to reflect the cost of computing the address outside of an addressing mode? If so, then shouldn't rtx costs be used? Conversely, if address costs are a measure of something different, surely we shouldn't be adding or comparing them with rtx costs, but rather create some two-dimensional structure to compare two rtx/address entities? (similarly to the rtx cost/latency calculations in expmed for the mult synthesis code). Then there's the usage in loop-invariant.c where the address cost number is compared with '3'. This is really not portable across targets. The code wants some measure of whether an address is 'cheap'. There could be various approaches on how to provide that. Could we enumerate across an rtx array of representatives of all possible addresses for a target sorted by the TARGET_ADDRESS_PREFERABLE_P hook above? Should we be asking the target to tell us which address type is cheap? In any case, it seems to me that the cheapest address in a target is something that only the target can tell, unless we agree on a universal meaning and unit of measurement for address costs. What do people think? Would this approach be worth pursuing if the above questions can be answered? From having a quick look at config/ I think that converting the targets to use COSTS_N_INSNS units would not be a controversial task as long as the midend usage of address costs is consistent.I totally agree that use of address cost should be cleaned up. My point is the use is inconsistent on RTL level, maybe even worse than midend. Look at arm_arm_address_cost, it defines meaningless costs regarding to both rtx cost and address cost. It makes up to RTL passes like fprop_addr (maybe postreload too) thus these passes can propagate as many operations into address expressions and save alu instructions (fprop_addr pass may have other more important uses, please correct me here).Hi Bin, fwprop does try to propagate rtxes into addresses and it has a heuristic by which it checks the address costs. If one is clearly cheaper than the other according to these costs it picks that. If the costs are equal it picks the one with the *highest* rtx cost in the hope that it will help eliminate more costly expressions. I agree that the arm address costs are just a hack that were just picked to get the least bad behaviour from fwprop and postreload. I'm proposing to make a hook that takes two addresses and returns the preferable one. That way, the backend doesn't have to assign numbers to addresses and can do as deep relative analysis as it wants.As far as fwprop itself is concerned, IMHO the problem is with how the address cost interface is used, rather than the how the interface is implemented.
What would you prefer it to do? Currently it tries to propagate an rtx intoan address, compares the before and after address costs. How should it decide
on whether to perform the substitution? Just look at the rtx costs?
IMHO, this does the right optimization but with wrong method, and it has falls out too. For example, below code transformation is reported not only once before: add r5, r6, r7 ldr r8, [r5] ldr r8, [r5, 4] changed into: add r5, r6, r7 ldr r8, [r6, r7] ldr r8, [r5, 4] In this case, it corrupts load-pair opportunity and doesn't give any benefit. Save problem holds for anto-inc-dec optimization. So, here fprop_addr is doing context-sensitive optimization with respect to address cost, which isn't always right. I once tried to re-implement the pass so that it can firstly checks saved rtx costs and then the address costs benefit. I believe others also tried the same thing with different case.I had looked at this as well. I was trying to get fwprop to propagate a sign-extend into the addressing mode in aarch64 and hit something similar.Anyway, it never concluded because there is no obvious performance change. One or two alu instructions saved doesn't matter that much on advanced processors, not mention it might disturb combiner.I think it would be worth pursuing. My first stab at this in fwprop would be to pick the preferable address using the TARGET_ADDRESS_PREFERABLE_P hook above. The problem comes in replicating the current behaviour. If the hook just returns true/false, then we lose the heuristic we currently use when the addresses are of equal cost. A possible way around this that I'm experimenting with is to make this new hook return {-1,0,1} rather than a boolean with -1,+1 representing a definite preference and 0 having no preference. That way fwprop can tell when there's no clear preference and pick the more expensive one rtx-costs-wise.As for tree ivopts, address cost is used in both ways. For any address computation that's invalid, it tries to legitimize it into two parts, the first part results in alu instructions, the second part results in address expression of different addressing modes. Right now the rtx cost (for the first part) and the address cost (for the second part) are accumulated and compared altogether. I do like the idea split costs into different types because all elements are entangled in single cost model, and it's hard to tune for specific case.Thanks for explaining. I think it would be possible to make the comparisons there a bit more sane. If an address computation is split into alu instructions and a legitimate address then carry around the rtx cost of the alu part and the address. When the time comes to compare two computations, we create a more involved way of comparing. For example (would need benchmarking, of course): * Compare the rtx costs of the alu components and check the address preference for the legitimate address components using the new hook. * If the alu part costs are of equal rtx cost, pick the one which has the preferable legitimate address. * If expression 'a' has a more expensive alu component than 'b' but a more preferable address component, then use some tie breaker. Perhaps apply rtx costs on the address expression and compare those... I think that if we fixed up the usage in tree-ssa-loop-ivopts.c the way described above, and figure out what to do for the 'cheap address' heuristic in loop-invariant.c, we could eliminate the use of address costs altogether. Or at least make them a backend-specific measure that each backend can use in the process of determining a preferable address but would not have to be mixed with rtx costs or magic constants in the midend.Forgot to mention. Just like Sandra commented, I think what really important is how legitimize_address works and how IVOPT mimicries the legitimization behavior and generates the final code, rather cost for different addressing modes. Actually aarch64 and some other targets return exactly the same cost (0) for all addressing modes. And I think it really doesn't matter very much.
Hmmm... so for aarch64 at least it seems that address costs are irrelevant in the ivopt pass. The problems in the ivopt pass are rather down to the way it tries to legitimize the address. Can we just remove the address costs usage from there? Are thereany targets on which address costs make a meaningful, predictable difference in what ivopt does? I'm hoping we can remove the address costs from the midend entirely, to discourage uses such as in ivopt and loop-invariant where code compares the address cost number to a magic constant, tuned for a single benchmark on a single target because it needs to determine whether an address is 'cheap'. If we don't use address costs in fwprop and ivopts, then the only legitimate usage seems to be in postreload.c, which can surely be better serviced by a more targetted hook that takes two addresses and picks the better one for that particular purpose. Thanks, Kyrill
Thanks, binThanks, KyrillPlease correct me if I was wrong. Thanks, bin
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |