Created attachment 34178 [details] testcase $ cc -O2 -S -o stm.s stm.c $ cc -O2 -DOPT -S -o stm-opt.s stm.c if you compare the 2 outputs, stm_load function is using one more slot on the stack. The difference is only this: static inline size_t AO_myload2(const volatile size_t *addr) { return *(size_t *)addr; } static inline size_t AO_myload(const volatile size_t *addr) { #ifdef OPT size_t result = AO_myload2(addr); #else size_t result = *(size_t *)addr; #endif return result; } Having one more inlined function should have the same optimization not a better one. 4.8 does not have the problem and the code generated is the same.
Looks like out of SSA is producing the difference. Change the function to just: return *(size_t *)addr; Changes it to what the inline function does.
Confirmed.
The difference is in whether there are extra user-named variables in the end and thus SSA coalescing decision differences: stm_load (volatile stm_word_t * addr) { - stm_word_t l; - stm_word_t value; stm_word_t version; stm_word_t l; struct r_entry_t * r; - stm_word_t now; ... + size_t _32; + size_t _33; + size_t _34; ... Conflict graph: +1: 3 +3: 1 After sorting: Sorted Coalesce list: +(16610) _30 <-> _33 (651) _10 <-> _30 ... -Coalesce list: (10)_10 & (30)_30 [map: 1, 2] : Success -> 1 +Coalesce list: (30)_30 & (33)_33 [map: 2, 3] : Success -> 2 +Coalesce list: (10)_10 & (30)_30 [map: 1, 2] : Fail due to conflict So it turns out the different coalescing ends up generating worse code. It would be interesting to see why we decide that coalescing _30 and _33 is so much more beneficial than coalescing _10 and _30. Ah, it simply uses EDGE_FREQUENCY... and for some reason we predicted that _33 & 1 != 0 is 10% taken only. So ... the theory is that the version is faster on the important path?
That is, it is good # _30 = PHI <0(2), _10(8), value_33(7), value_33(3), value_33(6)> return _30; vs bad # _30 = PHI <0(2), _10(8), _33(7), _33(3), _33(6)> return _30; where it thinks that coalescing _30 and _33 is more important than coalescing _30 and _10 (which looks reasonable). The question is why RTL opts make such large difference out of this.
The regression starts from this commit: trunk@200103 commit f82f0ea592c2d78077e03f5a1a3b9b40751cc116 Author: law <law@138bc75d-0d04-0410-961f-82ee72b054a4> Date: Fri Jun 14 18:52:32 2013 +0000 * gimple.h (gimple_can_coalesce_p): Prototype. * tree-ssa-coalesce.c (gimple_can_coalesce_p): New function. (create_outofssa_var_map, coalesce_partitions): Use it. * tree-ssa-uncprop.c (uncprop_into_successor_phis): Similarly. * tree-ssa-live.c (var_map_base_init): Use TYPE_CANONICAL if it's available. * gcc.dg/tree-ssa/coalesce-1.c: New test. And especially from this: gcc/tree-ssa-coalesce.c:gimple_can_coalesce_p (tree name1, tree name2) ... /* If the types are not the same, check for a canonical type match. This (for example) allows coalescing when the types are fundamentally the same, but just have different names. Note pointer types with different address spaces may have the same canonical type. Those are rejected for coalescing by the types_compatible_p check. */ if (TYPE_CANONICAL (t1) && TYPE_CANONICAL (t1) == TYPE_CANONICAL (t2) && types_compatible_p (t1, t2)) return true; But as Richard mentioned, I think the problem is in the RTL part.
The problem starts in the first copyrename pass. We refrain from coalescing variables if neither has a usable root. This causes us to fail to coalesce partitions that we would coalesce if only we tried them in a different order (i.e., P1 and P2 fail but either one coalesces with P3, and then the other would coalesce with that union). This is exactly what happens in AO_myload with -DOPT: partitions _3 and _6 won't coalesce, and then result_4 and _6 coalesce, which would enable _3 to coalesce too. This in turn brings additional rootless variables into stm_load, which again fail to coalesce because we try them first (_36 and _4 fails, and l_11 and _36 passes, so that with _4 would work). Without -DOPT, we succeed in coalescing result (there are fewer SSA names), and when that is inlined, it gets there with a root symbol, so that coalescing of _36 and result_4 succeeds. In both cases, we coalesce l_11 with result_36, and it is precisely because _4 remains separate that we end up introducing copies in new blocks when going out of SSA into RTL. Coalescing SSA names even when neither has a root removes the optimization differences for this testcase.
OK, but why does this make such a difference in the final result. Ie, what happens as we get into RTL. It would also be helpful to see the current & desired output for the case where we've regressed.
(In reply to Alexandre Oliva from comment #6) > > Coalescing SSA names even when neither has a root removes the optimization > differences for this testcase. I would worry about side effects... Coalescing is always a balancing act. If you allow things to coalesce when neither has a root, a few of those coalesced together may prevent a coalesce with a much more desirable root in other programs. And of course more coalesces may mean longer live ranges... so any changes like this need a lot of studying of side effects on a much larger set of problems. It was a long time ago, so I don't remember the reasons, but most of these kinds of restrictions were rooted in some kind of rationale... usually from examining an issue of some sort and introducing the restriction to avoid it. The original conditions may no longer exist, but we'd have to run tests to make sure nothing shows up. Perhaps tweaking the costing model could help too.
(In reply to Jeffrey A. Law from comment #7) > OK, but why does this make such a difference in the final result. Ie, what happens as we get into RTL. Err, I covered that bit in my description: we emit a number of copies, each with a new BB of its own. The additional pseudo survives all the way to the end, and so do the copies, which is enough to explain the additional stack slot. Further rearrangement of code also follows from the additional blocks. (In reply to Andrew Macleod from comment #8) > most of these kinds of restrictions were rooted in some kind of rationale... usually from examining an issue of some sort and introducing the restriction to avoid it. This one was added by https://gcc.gnu.org/ml/gcc-patches/2012-08/msg00517.html The description mentions out-of-SSA coalescing, but not copyrename coalescing, which is what this is about. Since there were not anonymous SSA names before, the introduced condition would necessarily not be met, so it effectively reduced the amount of copyrename coalescing without any rationale AFAICT. Richi, do you happen to have any insight as to why you changed copyrename to avoid coalescing rootless ssa names? Reverting it doesn't seem to have any ill effects on code correctness (unlike allowing coalescing between rooted and rootless vars during out-of-ssa, in gimple_can_coalesce_p, which breaks codegen badly, though I haven't dug into why). I'll attach a patch I'm testing to that effect momentarily.
Created attachment 35048 [details] Patch that restores coalescing of anonymous SSA vars
Uhh, Richi, there's a question for you in comment 9, but I forgot to cc: you before saving the post.
I just noticed that I reversed with and without -DOPT in my analysis in comment 6. Apologies for any confusion this may have caused.
I looked further into why changing gimple_can_coalesce_p didn't work: build_ssa_conflict_graph only marks conflicts between SSA names if they share the same base variable. This explains why we have a conflict in the conflict map with -DOPT, in which l_4, values_29 and now_30 all have base variables, whereas without -DOPT they don't. So, we can't coalesce them with _10 and _28, even though 10 and 29 do conflict. I ran a test in which I allowed values_29 and _28 to coalesce, by hooking into gimple_can_coalesce_p during an -UOPT out-of-ssa and getting it to return true. This yielded the same partition map as the -DOPT case, but an empty conflict map for the reason above. The sorted coalesce list was the same, too, so we coalesced values_29 and _28, as in -DOPT, but then, due to the lack of the conflict, we also coalesced _10 into the same partition. This experimental additional invalid coalescing in turn caused other differences in the generated RTL. At this point I'll sum up my findings: - we fail to coalesce during copyrename because we've ruled out, at that point, coalescing of anonymous SSA names. as a consequence, only some coalescing opportunities are taken, leaving some anonymous names that we try to coalesce first not coalesced - as a consequence, we fail to coalesce some SSA names because some of the SSA variables have become anonymous whereas others aren't - as a consequence, we emit additional copies in additional BBs, and they survive all the way to the end of compilation Although changing copyrename to allow coalescing in these cases fixes the problem, I suppose it would also be possible to change out-of-ssa to compensate; we would have to somehow mark conflicts between anonymous and non-anonymous variables in the conflict graph, though.
So, forgive me, but is -DOPT supposed to be the good or the bad code? From looking at the results I get, -DOPT is supposed to be the good code, but that seems to conflict with c#6. For -DOPT I get this PHI: # _28 = PHI <0(2), _29(3), _29(7), _10(8), _29(6)> For -UOPT I get this: # _28 = PHI <0(2), value_29(3), value_29(7), _10(8), value_29(6)> Note that _10 from BB 8 is known to have the value 0. That equivalency comes from the path 2->3->4->8. Coalescing _10 with anything is probably the least interesting. Coalescing _28 and _29 seems more profitable. Would we get better code if we left the constant in place? Can we come up with a reasonable heuristic to determine if we're better off with _10 or the constant in that PHI argument? Right now uncprop will always replace a constant with an SSA_NAME having the right value if the SSA_NAME could potentially coalesce with the result of the PHI. The theory being that a constant will always need initialization, but if the SSA_NAME coalesces with the result, then the SSA_NAME will likely be "free". The obvious exception is when that coalescing prevents other things from coalescing. I also wonder if we could be cleaning things up in RTL to make this less of an issue. For example, would optimizing the obvious tail-merge/crossjump earlier in the RTL pipeline help the register allocator coalesce things better?
(In reply to Alexandre Oliva from comment #9) > (In reply to Jeffrey A. Law from comment #7) > > OK, but why does this make such a difference in the final result. Ie, what happens as we get into RTL. > > Err, I covered that bit in my description: we emit a number of copies, each > with a new BB of its own. The additional pseudo survives all the way to the > end, and so do the copies, which is enough to explain the additional stack > slot. Further rearrangement of code also follows from the additional blocks. > > (In reply to Andrew Macleod from comment #8) > > most of these kinds of restrictions were rooted in some kind of rationale... usually from examining an issue of some sort and introducing the restriction to avoid it. > > This one was added by > https://gcc.gnu.org/ml/gcc-patches/2012-08/msg00517.html > The description mentions out-of-SSA coalescing, but not copyrename > coalescing, which is what this is about. Since there were not anonymous SSA > names before, the introduced condition would necessarily not be met, so it > effectively reduced the amount of copyrename coalescing without any > rationale AFAICT. > > Richi, do you happen to have any insight as to why you changed copyrename to > avoid coalescing rootless ssa names? Because "coalescing" them doesn't do anything. copyrename coalescing assigns the same underlying DECL to SSA names, thus makes SSA_NAME_VAR the same. But when both SSA_NAME_VARs are NULL this won't do anything. So the effect of the patch was rather that there are now way less SSA names with SSA_NAME_VAR != NULL. In particular all the pass created ones (with no relation to user declared variables). It didn't make sense to apply copyrename to those (copyrename is to have better debug info in the end, sth which should be provided by debug stmts today). > Reverting it doesn't seem to have any > ill effects on code correctness (unlike allowing coalescing between rooted > and rootless vars during out-of-ssa, in gimple_can_coalesce_p, which breaks > codegen badly, though I haven't dug into why). I'll attach a patch I'm > testing to that effect momentarily. So I'd say the patch can't make a difference - it at most can end up re-writing the type of an SSA name and then having followup effects in out-of-SSA coalescing? For some odd reason we are ignoring some type differences in copyrename coalescing. But - how does either root1 or root2 end up non-zero at /* Set the root variable of the partition to the better choice, if there is one. */ if (!ign2 && root2) replace_ssa_name_symbol (partition_to_var (map, p3), root2); else if (!ign1 && root1) replace_ssa_name_symbol (partition_to_var (map, p3), root1); else gcc_unreachable (); ? That is, the only effect of your patch is to do partiton_union? Ah, and then go to: /* Now one more pass to make all elements of a partition share the same root variable. */ for (x = 1; x < num_ssa_names; x++) { part_var = partition_to_var (map, x); if (!part_var) continue; var = ssa_name (x); if (SSA_NAME_VAR (var) == SSA_NAME_VAR (part_var)) continue; if (debug) { fprintf (debug, "Coalesced "); print_generic_expr (debug, var, TDF_SLIM); fprintf (debug, " to "); print_generic_expr (debug, part_var, TDF_SLIM); fprintf (debug, "\n"); } stats.coalesced++; replace_ssa_name_symbol (var, SSA_NAME_VAR (part_var)); } and wreck SSA names type (and "debug" identifier) only? That really sounds bogus. The correct fix would be to allow the kind of type differences copyrename coalescing allows also in out-of-SSA coalescing. That is, I don't see how the patch makes sense.
And as I analyzed in comment #3 we chose the now different coalescing because it is more profitable (to the cost analysis we perform in out-of-SSA coalescing). So the fix, if any, is there (or in the code maintaining edge frequencies).
To add to all this - IMHO copyrename should go - it's purpose was to have more SSA names with user-decls and thus debug info for them. This should now be dealt with debug insns (in a way better and correct way). Yes, out-of-SSA coalescing could be changed to allow coalescing of SSA names with a user-DECL and anonymous SSA names (or SSA names with a DECL_IGNORED_P decl). But that will make the conflict graph much larger(?). The decision past then was to retain previous behavior which wouldn't have coalesced this case (because the DECL of the now anonymous SSA name was different). In theory we could just go for it, hoping the conflict graph will not get much larger. But it might happen that we end up coalescing things in a way that there ends up being no DECL for a register and thus we lose in debug quality (not sure if we could compensate by inserting debug stmts on the edges to compensate for that - we couldn't do that if it requires splitting the edge). Not sure if always coalescing to the variable with the user-decl won't make debug info wrong, like in bb1: _1 = 2 * _3; goto <bb 3>; bb2: i_4 = 5 / _3; bb3: _3 = PHI <i_4 (2), _1 (1)> if we coalesce to _1 to i_4 then the expression 2 * _3 will be computed to a register with decl 'i'. So to avoid wrong debug we'd have to always coalesce to an anonymous entity - but that would lose the association of 'i' to 5 / _3 in bb2 unless we manage to compensate for that somehow with debug-insns we magically emit during RTL expansion.
(In reply to Jeffrey A. Law from comment #14) > So, forgive me, but is -DOPT supposed to be the good or the bad code? It's the good one. As noted in comment 12, I had that backwards in comment 6. (In reply to Richard Biener from comment #15) > Because "coalescing" them doesn't do anything. copyrename coalescing > assigns the same underlying DECL to SSA names, thus makes SSA_NAME_VAR > the same. But when both SSA_NAME_VARs are NULL this won't do anything. This may be true when you look at only one pair of SSA names, but when you have 2 such candidate pairs involving one common SSA name, it can, and it does make a difference, as described in comment 6. (In reply to Richard Biener from comment #16) > And as I analyzed in comment #3 we chose the now different coalescing because > it is more profitable (to the cost analysis we perform in out-of-SSA > coalescing). No, we don't even *consider* the coalescing performed in the -DOPT case, because, as noted in comment 13, the SSA names ended up with different base names, because copyrename wouldn't give them the same base name. (In reply to Richard Biener from comment #17) > To add to all this - IMHO copyrename should go That's fine with me. > Yes, out-of-SSA coalescing could be changed to allow coalescing of SSA names > with a user-DECL and anonymous SSA names (or SSA names with a DECL_IGNORED_P > decl). But that will make the conflict graph much larger(?). Ok, I'll give that a shot tomorrow (I'll be away for the whole day today). > might happen that we end up coalescing things in a way > that there ends up being no DECL for a register and thus we lose in debug > quality (not sure if we could compensate by inserting debug stmts on the > edges to compensate for that - we couldn't do that if it requires splitting > the edge). I'm not concerned about that. Any debug stmts needed to make this work will already be in place, after the actual assignments, and after early PHI nodes. Sure, edge insertions might separate the copies from the debug stmts inserted after the PHI nodes, but ultimately the bindings should take care of it. > So to avoid wrong debug we'd have to always coalesce to an anonymous > entity With debug stmts, we don't care what the base names are any more. All the info we need (for tracked auto variables) is in the debug stmts.
On Wed, 18 Mar 2015, aoliva at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 > > --- Comment #18 from Alexandre Oliva <aoliva at gcc dot gnu.org> --- > (In reply to Jeffrey A. Law from comment #14) > > So, forgive me, but is -DOPT supposed to be the good or the bad code? > > It's the good one. As noted in comment 12, I had that backwards in comment 6. > > (In reply to Richard Biener from comment #15) > > Because "coalescing" them doesn't do anything. copyrename coalescing > > assigns the same underlying DECL to SSA names, thus makes SSA_NAME_VAR > > the same. But when both SSA_NAME_VARs are NULL this won't do anything. > > This may be true when you look at only one pair of SSA names, but when you have > 2 such candidate pairs involving one common SSA name, it can, and it does make > a difference, as described in comment 6. > > (In reply to Richard Biener from comment #16) > > And as I analyzed in comment #3 we chose the now different coalescing because > > it is more profitable (to the cost analysis we perform in out-of-SSA > > coalescing). > > No, we don't even *consider* the coalescing performed in the -DOPT case, > because, as noted in comment 13, the SSA names ended up with different base > names, because copyrename wouldn't give them the same base name. > > (In reply to Richard Biener from comment #17) > > To add to all this - IMHO copyrename should go > > That's fine with me. > > > Yes, out-of-SSA coalescing could be changed to allow coalescing of SSA names > > with a user-DECL and anonymous SSA names (or SSA names with a DECL_IGNORED_P > > decl). But that will make the conflict graph much larger(?). > > Ok, I'll give that a shot tomorrow (I'll be away for the whole day today). > > > might happen that we end up coalescing things in a way > > that there ends up being no DECL for a register and thus we lose in debug > > quality (not sure if we could compensate by inserting debug stmts on the > > edges to compensate for that - we couldn't do that if it requires splitting > > the edge). > > I'm not concerned about that. Any debug stmts needed to make this work will > already be in place, after the actual assignments, and after early PHI nodes. > Sure, edge insertions might separate the copies from the debug stmts inserted > after the PHI nodes, but ultimately the bindings should take care of it. > > > So to avoid wrong debug we'd have to always coalesce to an anonymous > > entity > > With debug stmts, we don't care what the base names are any more. All the info > we need (for tracked auto variables) is in the debug stmts. But we do not always have debug stmts! int bar (int b) { int i; if (b > 7) i = b / 3; else return b * 4; return i; } is a testcase for what I am thinking of. Hmm. For some reason into-SSA inserts a debug stmt, creating an extra copy _5 = i_4?! <bb 2>: if (b_2(D) > 7) goto <bb 3>; else goto <bb 4>; <bb 3>: i_4 = b_2(D) / 3; # DEBUG i => i_4 _5 = i_4; goto <bb 5>; <bb 4>: _3 = b_2(D) * 4; <bb 5>: # _1 = PHI <_3(4), _5(3)> so indeed we shouldn't lose anything here. But are you sure that we are never using REG_EXPR for debug? With -fno-var-tracking-assignments for sure we do (I suppose we don't care), at least var-tracking.c does look at REG_EXPR.
(In reply to Alexandre Oliva from comment #18) > No, we don't even *consider* the coalescing performed in the -DOPT case, > because, as noted in comment 13, the SSA names ended up with different base > names, because copyrename wouldn't give them the same base name. > > (In reply to Richard Biener from comment #17) > > To add to all this - IMHO copyrename should go > > That's fine with me. > > > Yes, out-of-SSA coalescing could be changed to allow coalescing of SSA names > > with a user-DECL and anonymous SSA names (or SSA names with a DECL_IGNORED_P > > decl). But that will make the conflict graph much larger(?). Not only that: unless we allow also coalescing between different SSA names in out-of-ssa, we'll lose optimizations that copyrename would have enabled out-of-SSA to make, because the removed copyrename won't have assigned the same base name to otherwise unrelated SSA names that out-of-SSA won't even consider for coalescing. Now *this* might make the conflict graph explode in size.
(In reply to rguenther@suse.de from comment #19) > For some reason > into-SSA inserts a debug stmt, creating an extra copy _5 = i_4?! I bet the extra copy is there even without -g. _1, _3 and _5 seem to be SSA names for the result decl. Presumably _5 might be optimized out, replaced with i_4 in the _1 PHI node, at a later pass. > But are you sure > that we are never using REG_EXPR for debug? We use them for non-gimple_register auto variables. > With > -fno-var-tracking-assignments for sure we do (I suppose we don't care), Yeah, I'd say we don't care. If one asks for debug info degradation, we obey ;-) (now, sometimes we disable VTA on our own because some internal limit is exceeded; that's unfortunate, but what are you gonna do? :-)
Let's try to stay focused. Killing copyrename seems like a fine thing to do as long as the resulting debug info is good. But that's independent of this BZ. I still find myself wondering if leaving the "0" instead of "_10" in that PHI node is a reasonable approach. Certainly if I hack uncprop to leave things in that state, I get the code we want. And ISTM that uncprop ought to leave the constant alone if the SSA_NAME holding the constant conflicts with any of the other SSA_NAMEs in the PHI node that may potentially coalesce with the PHI result. That captures pretty well the case where the constant is better than an SSA_NAME. In this particular case, we have: # _28 = PHI <0(2), _29(3), _29(7), _10(8), _29(6)> When _28 and _10 coalesce, the result then conflicts with _29 during IRA/LRA because of the extended lifetime of _10). Thus the annoying copies created by out-of-ssa can't be eliminated. WIth the proposed change, we'd instead have: # _28 = PHI <0(2), _29(3), _29(7), 0(8), _29(6)> While we won't coalesce _28/_29 during out-of-ssa, LRA will see the copies and note that the associated pseudos don't conflict and ultimately assign them to the same hard register and the annoying copies are gone.
On Fri, 20 Mar 2015, law at redhat dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 > > --- Comment #22 from Jeffrey A. Law <law at redhat dot com> --- > Let's try to stay focused. Killing copyrename seems like a fine thing to do as > long as the resulting debug info is good. But that's independent of this BZ. > > I still find myself wondering if leaving the "0" instead of "_10" in that PHI > node is a reasonable approach. Certainly if I hack uncprop to leave things in > that state, I get the code we want. > > And ISTM that uncprop ought to leave the constant alone if the SSA_NAME holding > the constant conflicts with any of the other SSA_NAMEs in the PHI node that may > potentially coalesce with the PHI result. That captures pretty well the case > where the constant is better than an SSA_NAME. > > In this particular case, we have: > > # _28 = PHI <0(2), _29(3), _29(7), _10(8), _29(6)> > > When _28 and _10 coalesce, the result then conflicts with _29 during IRA/LRA > because of the extended lifetime of _10). Thus the annoying copies created by > out-of-ssa can't be eliminated. > > WIth the proposed change, we'd instead have: > > # _28 = PHI <0(2), _29(3), _29(7), 0(8), _29(6)> > > While we won't coalesce _28/_29 during out-of-ssa, LRA will see the copies and > note that the associated pseudos don't conflict and ultimately assign them to > the same hard register and the annoying copies are gone. Sure - but then we'd better integrate uncprop into the coalescing, otherwise you need to re-compute conflicts / live twice.
Integrating uncprop and out-of-ssa certainly crossed my mind when I wrote that comment :-) Or at least arranging to build the life info/conflict stuff once and have it shared. A poor man's version is to note the path for the equivalence and see if any of the blocks on that path define the other PHI arguments. That ought to be fast and simple enough to possibly include in gcc5.
WIP patch accidentally posted to gcc-patches: https://gcc.gnu.org/ml/gcc-patches/2015-03/msg01460.html
Patch posted at https://gcc.gnu.org/ml/gcc-patches/2015-03/msg01491.html
I confirm that the patch fixes the performance problem that I had. I guess the patch is too complex to be backported. Thanks a lot Alexandre for the patch and to all for the deep analysis! (just waiting for review and commit :))
So I've been thinking about how to integrate life/conflict analysis into the uncprop code and it may not be that bad, both from an implementation and computation standpoint. Most importantly, we don't have to compute full life information. We really just need to compute the life of the equivalence. Given the life of the equivalence, if the equivalence is live in any block that contains the defining statement for an SSA_NAME appearing in the target PHI, then the equivalence conflicts and we don't want to unpropagate it. Computing the life of the equivalence is pretty easy and should be reasonably quick. This is a cost we'd have to pay regardless of whether or not we integrate uncprop with out-of-ssa since we won't have life information for the expression. Collecting the SSA_NAMEs appearing on the RHS of the PHI so that we don't test for conflicts multiple times if an SSA_NAME shows up in multiple PHI alternatives would help keep the cost down as well. Ultimately I don't think we need to integrate uncprop and out-of-ssa to avoid the unprofitable transformation during uncprop.
No decision has been made on whether or not to include either or both approaches to fixing this BZ into GCC 5. It's still under evaluation/review. I think for GCC 6 it's highly likely we'll have both since they're both independently good things to do.
(In reply to Jeffrey A. Law from comment #28) > So I've been thinking about how to integrate life/conflict analysis into the > uncprop code and it may not be that bad, both from an implementation and > computation standpoint. > > Most importantly, we don't have to compute full life information. We really > just need to compute the life of the equivalence. Given the life of the > equivalence, if the equivalence is live in any block that contains the > defining statement for an SSA_NAME appearing in the target PHI, then the > equivalence conflicts and we don't want to unpropagate it. > > Computing the life of the equivalence is pretty easy and should be > reasonably quick. This is a cost we'd have to pay regardless of whether or > not we integrate uncprop with out-of-ssa since we won't have life > information for the expression. > > Collecting the SSA_NAMEs appearing on the RHS of the PHI so that we don't > test for conflicts multiple times if an SSA_NAME shows up in multiple PHI > alternatives would help keep the cost down as well. > > Ultimately I don't think we need to integrate uncprop and out-of-ssa to > avoid the unprofitable transformation during uncprop. Also see "Boissinot et al., Fast Liveness Checking for SSA-Form Programs" (CGO 08). They describe a way to do fast liveness queries without actually doing a (memory) expensive data-flow analysis but using SSA immediate-uses and dominance checks. Sth we could use in SSA coalescing as well to avoid both the liveness bitmaps and the conflict graph.
I'd say we push this back to GCC 6.
Agreed, though ideally it should be fixed early in stage1.
On 03/31/2015 05:25 AM, rguenth at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 > > --- Comment #30 from Richard Biener <rguenth at gcc dot gnu.org> --- > (In reply to Jeffrey A. Law from comment #28) >> So I've been thinking about how to integrate life/conflict analysis into the >> uncprop code and it may not be that bad, both from an implementation and >> computation standpoint. >> >> Most importantly, we don't have to compute full life information. We really >> just need to compute the life of the equivalence. Given the life of the >> equivalence, if the equivalence is live in any block that contains the >> defining statement for an SSA_NAME appearing in the target PHI, then the >> equivalence conflicts and we don't want to unpropagate it. >> >> Computing the life of the equivalence is pretty easy and should be >> reasonably quick. This is a cost we'd have to pay regardless of whether or >> not we integrate uncprop with out-of-ssa since we won't have life >> information for the expression. >> >> Collecting the SSA_NAMEs appearing on the RHS of the PHI so that we don't >> test for conflicts multiple times if an SSA_NAME shows up in multiple PHI >> alternatives would help keep the cost down as well. >> >> Ultimately I don't think we need to integrate uncprop and out-of-ssa to >> avoid the unprofitable transformation during uncprop. > > Also see "Boissinot et al., Fast Liveness Checking for SSA-Form Programs" > (CGO 08). They describe a way to do fast liveness queries without actually > doing a (memory) expensive data-flow analysis but using SSA immediate-uses > and dominance checks. Sth we could use in SSA coalescing as well to avoid > both the liveness bitmaps and the conflict graph. Yea, it looks reasonably interesting and there's probably benefit in experimenting with that approach. However, be aware that it's memory consumption can be problematical. According to their summary, it's quadratic. Though presumably we could drop back to the tried and true approach if we have too many BBs. That definitely is stage1 material. Jeff
(In reply to Jeffrey A. Law from comment #33) > On 03/31/2015 05:25 AM, rguenth at gcc dot gnu.org wrote: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 > > > > --- Comment #30 from Richard Biener <rguenth at gcc dot gnu.org> --- > > (In reply to Jeffrey A. Law from comment #28) > >> So I've been thinking about how to integrate life/conflict analysis into the > >> uncprop code and it may not be that bad, both from an implementation and > >> computation standpoint. > >> > >> Most importantly, we don't have to compute full life information. We really > >> just need to compute the life of the equivalence. Given the life of the > >> equivalence, if the equivalence is live in any block that contains the > >> defining statement for an SSA_NAME appearing in the target PHI, then the > >> equivalence conflicts and we don't want to unpropagate it. > >> > >> Computing the life of the equivalence is pretty easy and should be > >> reasonably quick. This is a cost we'd have to pay regardless of whether or > >> not we integrate uncprop with out-of-ssa since we won't have life > >> information for the expression. > >> > >> Collecting the SSA_NAMEs appearing on the RHS of the PHI so that we don't > >> test for conflicts multiple times if an SSA_NAME shows up in multiple PHI > >> alternatives would help keep the cost down as well. > >> > >> Ultimately I don't think we need to integrate uncprop and out-of-ssa to > >> avoid the unprofitable transformation during uncprop. > > > > Also see "Boissinot et al., Fast Liveness Checking for SSA-Form Programs" > > (CGO 08). They describe a way to do fast liveness queries without actually > > doing a (memory) expensive data-flow analysis but using SSA immediate-uses > > and dominance checks. Sth we could use in SSA coalescing as well to avoid > > both the liveness bitmaps and the conflict graph. > Yea, it looks reasonably interesting and there's probably benefit in > experimenting with that approach. However, be aware that it's memory > consumption can be problematical. According to their summary, it's > quadratic. Yes, but that's only if you store the liveness info. We don't need to do that but we only need to compute whether two partitions conflict for each coalescing candidate (which means a few SSA conflict checks dependent on partition size). Our current algorithm is already quadratic in memory use because we do store SSA liveness and the partition conflict graph. What I'm not sure is whether doing the SSA based liveness check is going to be slower compile-time wise. > Though presumably we could drop back to the tried and true > approach if we have too many BBs. > > That definitely is stage1 material. Indeed. > Jeff
Author: aoliva Date: Tue Jun 9 05:05:34 2015 New Revision: 224262 URL: https://gcc.gnu.org/viewcvs?rev=224262&root=gcc&view=rev Log: [PR64164] Drop copyrename, use coalescible partition as base when optimizing. for gcc/ChangeLog PR rtl-optimization/64164 * Makefile.in (OBJS): Drop tree-ssa-copyrename.o. * tree-ssa-copyrename.c: Removed. * opts.c (default_options_table): Drop -ftree-copyrename. Add -ftree-coalesce-vars. * passes.def: Drop all occurrences of pass_rename_ssa_copies. * common.opt (ftree-copyrename): Ignore. (ftree-coalesce-inlined-vars): Likewise. * doc/invoke.texi: Remove the ignored options above. * gimple-expr.h (gimple_can_coalesce_p): Move declaration * tree-ssa-coalesce.h: ... here. * tree-ssa-uncprop.c: Include tree-ssa-coalesce.h and other headers required by it. * gimple-expr.c (gimple_can_coalesce_p): Allow coalescing across variables when flag_tree_coalesce_vars. Check register use and promoted modes to allow coalescing. Moved to tree-ssa-coalesce.c. * tree-ssa-live.c (struct tree_int_map_hasher): Move along with its member functions to tree-ssa-coalesce.c. (var_map_base_init): Likewise. Renamed to compute_samebase_partition_bases. (partition_view_normal): Drop want_bases parameter. (partition_view_bitmap): Likewise. * tree-ssa-live.h: Adjust declarations. * tree-ssa-coalesce.c: Include explow.h. (build_ssa_conflict_graph): Process PARM_ and RESULT_DECLs's default defs at the entry point. (dump_part_var_map): New. (compute_optimized_partition_bases): New, called by... (coalesce_ssa_name): ... when flag_tree_coalesce_vars, instead of compute_samebase_partition_bases. Adjust. * alias.c (nonoverlapping_memrefs_p): Disregard gimple-regs. * cfgexpand.c (leader_merge): New. (get_rtl_for_parm_ssa_default_def): New. (set_rtl): Merge exprs and attrs, even for MEMs and non-SSA vars. Update DECL_RTL for PARM_DECLs and RESULT_DECLs too. (expand_one_stack_var_at): Handle anonymous SSA_NAMEs. Drop redundant MEM attr setting. (expand_one_stack_var_1): Handle anonymous SSA_NAMEs. Renamed from... (expand_one_stack_var): ... this. New wrapper to check and skip already expanded SSA partitions. (record_alignment_for_reg_var): New, factored out of... (expand_one_var): ... this. (expand_one_ssa_partition): New. (adjust_one_expanded_partition_var): New. (expand_one_register_var): Check and skip already expanded SSA partitions. (expand_used_vars): Don't create DECLs for anonymous SSA names. Expand all SSA partitions, then adjust all SSA names. (pass::execute): Replace the loops that set SA.partition_to_pseudo from partition leaders and cleared DECL_RTL for multi-location variables, and that which used to rename vars and set attrs, with one that clears DECL_RTL and checks that PARMs and RESULTs default_defs match DECL_RTL. * cfgexpand.h (get_rtl_for_parm_ssa_default_def): Declare. * emit-rtl.c (set_reg_attrs_for_parm): Handle NULL decl. * explow.c (promote_ssa_mode): New. * explow.h (promote_ssa_mode): Declare. * expr.c (expand_expr_real_1): Handle anonymous SSA_NAMEs. * function.c: Include cfgexpand.h. (use_register_for_decl): Handle SSA_NAMEs, anonymous or not. (use_register_for_parm_decl): Wrapper for the above to special-case the result_ptr. (rtl_for_parm): Ditto for get_rtl_for_parm_ssa_default_def. (maybe_reset_rtl_for_parm): Reset DECL_RTL of parms with multiple locations. (assign_parm_adjust_stack_rtl): Add all and parm arguments, for rtl_for_parm. For SSA-assigned parms, zero stack_parm. (assign_parm_setup_block): Prefer SSA-assigned location. (assign_parm_setup_reg): Likewise. Use entry_parm for equiv if stack_parm is NULL. (assign_parm_setup_stack): Prefer SSA-assigned location. (assign_parms): Maybe reset DECL_RTL of params. Adjust stack rtl before testing for pointer bounds. Special-case result_ptr. (expand_function_start): Maybe reset DECL_RTL of result. Prefer SSA-assigned location for result and static chain. Factor out DECL_RESULT and SET_DECL_RTL. * tree-outof-ssa.c (insert_value_copy_on_edge): Handle anonymous SSA names. Use promote_ssa_mode. (get_temp_reg): Likewise. (remove_ssa_form): Adjust. * var-tracking.c (dataflow_set_clear_at_call): Take call_insn and get its reg_usage for reg invalidation. (compute_bb_dataflow): Pass it insn. (emit_notes_in_bb): Likewise. * tree-ssa-loop-niter.c (loop_exits_before_overflow): Don't fail assert on conversion between unsigned types. for gcc/testsuite/ChangeLog * gcc.dg/guality/pr54200.c: Add -fno-tree-coalesce-vars. * gcc.dg/ssp-1.c: Make counter a register. * gcc.dg/ssp-2.c: Likewise. * gcc.dg/torture/parm-coalesce.c: New. Added: trunk/gcc/testsuite/gcc.dg/torture/parm-coalesce.c Removed: trunk/gcc/tree-ssa-copyrename.c Modified: trunk/gcc/ChangeLog trunk/gcc/Makefile.in trunk/gcc/alias.c trunk/gcc/cfgexpand.c trunk/gcc/cfgexpand.h trunk/gcc/common.opt trunk/gcc/doc/invoke.texi trunk/gcc/emit-rtl.c trunk/gcc/explow.c trunk/gcc/explow.h trunk/gcc/expr.c trunk/gcc/function.c trunk/gcc/gimple-expr.c trunk/gcc/gimple-expr.h trunk/gcc/opts.c trunk/gcc/passes.def trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/guality/pr54200.c trunk/gcc/testsuite/gcc.dg/ssp-1.c trunk/gcc/testsuite/gcc.dg/ssp-2.c trunk/gcc/tree-outof-ssa.c trunk/gcc/tree-ssa-coalesce.c trunk/gcc/tree-ssa-coalesce.h trunk/gcc/tree-ssa-live.c trunk/gcc/tree-ssa-live.h trunk/gcc/tree-ssa-loop-niter.c trunk/gcc/tree-ssa-uncprop.c trunk/gcc/var-tracking.c
I don't think we're backporting this huge change, and the much smaller one wasn't well-received, so...
The patch severely breaks on SPARC though: int foo (char c) { return (int) c; } eric@polaris:~/build/gcc/sparc-sun-solaris2.10> gcc/xgcc -Bgcc -S -O t.c t.c: In function 'foo': t.c:1:5: internal compiler error: in assign_parm_setup_reg, at function.c:3120 int foo (char c) ^ 0xa335d0 assign_parm_setup_reg /home/eric/svn/gcc/gcc/function.c:3120 0xa3660f assign_parms /home/eric/svn/gcc/gcc/function.c:3778 0xa3a9fd expand_function_start(tree_node*) /home/eric/svn/gcc/gcc/function.c:5215 0x838716 execute /home/eric/svn/gcc/gcc/cfgexpand.c:6127 Please submit a full bug report,
It also broke ARM and PowerPC. This patch should have been tested on all of the config-list.mk targets.
At least the sparc regression was caused by the change richi requested to disregard the underlying decl in promote_ssa_mode. I didn't realize this could cause a mismatch between the mode of the partition created for the QImode parm default def and the promoted mode for the parm decl expected by the parm-assignment code in function.c. This will likely take some time to sort out, so I'm reverting the patch for now.
Created attachment 35740 [details] ARM testcase This is the testcase that breaks on ARM, when compiled with optimizations: -O0 is OK, -O1, -O2, -O3 crash with: /media/lyon/9be1a707-5b7f-46da-9106-e084a5dbb011/ssd/src/GCC/sources/gcc-fsf/trunk/libgcc/fixed-bit.c: In function '__gnu_addqq3': /media/lyon/9be1a707-5b7f-46da-9106-e084a5dbb011/ssd/src/GCC/sources/gcc-fsf/trunk/libgcc/fixed-bit.c:59:1: internal compiler error: RTL flag check: MEM_VOLATILE_P used with unexpected rtx code 'reg' in set_mem_attributes_minus_bitpos, at emit-rtl.c:1787 FIXED_ADD (FIXED_C_TYPE a, FIXED_C_TYPE b) ^ 0xa6eb52 rtl_check_failed_flag(char const*, rtx_def const*, char const*, int, char const*) /media/lyon/9be1a707-5b7f-46da-9106-e084a5dbb011/ssd/src/GCC/sources/gcc-fsf/trunk/gcc/rtl.c:800 0x771fc7 set_mem_attributes_minus_bitpos(rtx_def*, tree_node*, int, long) /media/lyon/9be1a707-5b7f-46da-9106-e084a5dbb011/ssd/src/GCC/sources/gcc-fsf/trunk/gcc/emit-rtl.c:1787 0x805294 assign_parm_setup_block /media/lyon/9be1a707-5b7f-46da-9106-e084a5dbb011/ssd/src/GCC/sources/gcc-fsf/trunk/gcc/function.c:2977 0x80b65c assign_parms /media/lyon/9be1a707-5b7f-46da-9106-e084a5dbb011/ssd/src/GCC/sources/gcc-fsf/trunk/gcc/function.c:3775 0x80e087 expand_function_start(tree_node*) /media/lyon/9be1a707-5b7f-46da-9106-e084a5dbb011/ssd/src/GCC/sources/gcc-fsf/trunk/gcc/function.c:5215 0x6a77ed execute /media/lyon/9be1a707-5b7f-46da-9106-e084a5dbb011/ssd/src/GCC/sources/gcc-fsf/trunk/gcc/cfgexpand.c:6127 Please submit a full bug report,
Created attachment 35742 [details] AIX PowerPC testcase
Created attachment 35743 [details] PPC64LE Linux Testcase
Author: aoliva Date: Thu Jul 23 15:34:49 2015 New Revision: 226113 URL: https://gcc.gnu.org/viewcvs?rev=226113&root=gcc&view=rev Log: [PR64164] Drop copyrename, use coalescible partition as base when optimizing. for gcc/ChangeLog PR rtl-optimization/64164 * Makefile.in (OBJS): Drop tree-ssa-copyrename.o. * tree-ssa-copyrename.c: Removed. * opts.c (default_options_table): Drop -ftree-copyrename. Add -ftree-coalesce-vars. * passes.def: Drop all occurrences of pass_rename_ssa_copies. * common.opt (ftree-copyrename): Ignore. (ftree-coalesce-inlined-vars): Likewise. * doc/invoke.texi: Remove the ignored options above. * gimple-expr.h (gimple_can_coalesce_p): Move declaration * tree-ssa-coalesce.h: ... here. * tree-ssa-uncprop.c: Include tree-ssa-coalesce.h and other headers required by it. * gimple-expr.c (gimple_can_coalesce_p): Allow coalescing across variables when flag_tree_coalesce_vars. Check register use and promoted modes to allow coalescing. Moved to tree-ssa-coalesce.c. * tree-ssa-live.c (struct tree_int_map_hasher): Move along with its member functions to tree-ssa-coalesce.c. (var_map_base_init): Likewise. Renamed to compute_samebase_partition_bases. (partition_view_normal): Drop want_bases parameter. (partition_view_bitmap): Likewise. * tree-ssa-live.h: Adjust declarations. * tree-ssa-coalesce.c: Include explow.h. (build_ssa_conflict_graph): Process PARM_ and RESULT_DECLs's default defs at the entry point. (dump_part_var_map): New. (compute_optimized_partition_bases): New, called by... (coalesce_ssa_name): ... when flag_tree_coalesce_vars, instead of compute_samebase_partition_bases. Adjust. * alias.c (nonoverlapping_memrefs_p): Disregard gimple-regs. * cfgexpand.c (leader_merge): New. (get_rtl_for_parm_ssa_default_def): New. (set_rtl): Merge exprs and attrs, even for MEMs and non-SSA vars. Update DECL_RTL for PARM_DECLs and RESULT_DECLs too. (expand_one_stack_var_at): Handle anonymous SSA_NAMEs. Drop redundant MEM attr setting. (expand_one_stack_var_1): Handle anonymous SSA_NAMEs. Renamed from... (expand_one_stack_var): ... this. New wrapper to check and skip already expanded SSA partitions. (record_alignment_for_reg_var): New, factored out of... (expand_one_var): ... this. (expand_one_ssa_partition): New. (adjust_one_expanded_partition_var): New. (expand_one_register_var): Check and skip already expanded SSA partitions. (expand_used_vars): Don't create DECLs for anonymous SSA names. Expand all SSA partitions, then adjust all SSA names. (pass::execute): Replace the loops that set SA.partition_to_pseudo from partition leaders and cleared DECL_RTL for multi-location variables, and that which used to rename vars and set attrs, with one that clears DECL_RTL and checks that PARMs and RESULTs default_defs match DECL_RTL. * cfgexpand.h (get_rtl_for_parm_ssa_default_def): Declare. * emit-rtl.c (set_reg_attrs_for_parm): Handle NULL decl. * explow.c (promote_ssa_mode): New. * explow.h (promote_ssa_mode): Declare. * expr.c (expand_expr_real_1): Handle anonymous SSA_NAMEs. * function.c: Include cfgexpand.h. (use_register_for_decl): Handle SSA_NAMEs, anonymous or not. (use_register_for_parm_decl): Wrapper for the above to special-case the result_ptr. (rtl_for_parm): Ditto for get_rtl_for_parm_ssa_default_def. (split_complex_args): Take assign_parm_data_all argument. Pass it to rtl_for_parm. Set up rtl and context for split args. (assign_parms_augmented_arg_list): Adjust. (maybe_reset_rtl_for_parm): Reset DECL_RTL of parms with multiple locations. Recognize split complex args. (assign_parm_adjust_stack_rtl): Add all and parm arguments, for rtl_for_parm. For SSA-assigned parms, zero stack_parm. (assign_parm_setup_block): Prefer SSA-assigned location. (assign_parm_setup_reg): Likewise. Use entry_parm for equiv if stack_parm is NULL. (assign_parm_setup_stack): Prefer SSA-assigned location. (assign_parms): Maybe reset DECL_RTL of params. Adjust stack rtl before testing for pointer bounds. Special-case result_ptr. (expand_function_start): Maybe reset DECL_RTL of result. Prefer SSA-assigned location for result and static chain. Factor out DECL_RESULT and SET_DECL_RTL. * tree-outof-ssa.c (insert_value_copy_on_edge): Handle anonymous SSA names. Use promote_ssa_mode. (get_temp_reg): Likewise. (remove_ssa_form): Adjust. * stor-layout.c (layout_decl): Don't set mem attributes of non-MEMs. * var-tracking.c (dataflow_set_clear_at_call): Take call_insn and get its reg_usage for reg invalidation. (compute_bb_dataflow): Pass it insn. (emit_notes_in_bb): Likewise. for gcc/testsuite/ChangeLog * gcc.dg/guality/pr54200.c: Add -fno-tree-coalesce-vars. * gcc.dg/ssp-1.c: Make counter a register. * gcc.dg/ssp-2.c: Likewise. * gcc.dg/torture/parm-coalesce.c: New. Added: trunk/gcc/testsuite/gcc.dg/torture/parm-coalesce.c Removed: trunk/gcc/tree-ssa-copyrename.c Modified: trunk/gcc/ChangeLog trunk/gcc/Makefile.in trunk/gcc/alias.c trunk/gcc/cfgexpand.c trunk/gcc/cfgexpand.h trunk/gcc/common.opt trunk/gcc/doc/invoke.texi trunk/gcc/emit-rtl.c trunk/gcc/explow.c trunk/gcc/explow.h trunk/gcc/expr.c trunk/gcc/function.c trunk/gcc/gimple-expr.c trunk/gcc/gimple-expr.h trunk/gcc/opts.c trunk/gcc/passes.def trunk/gcc/stor-layout.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/guality/pr54200.c trunk/gcc/testsuite/gcc.dg/ssp-1.c trunk/gcc/testsuite/gcc.dg/ssp-2.c trunk/gcc/tree-outof-ssa.c trunk/gcc/tree-ssa-coalesce.c trunk/gcc/tree-ssa-coalesce.h trunk/gcc/tree-ssa-live.c trunk/gcc/tree-ssa-live.h trunk/gcc/tree-ssa-uncprop.c trunk/gcc/var-tracking.c
This breaks gcc.dg/pr43300.c on aarch64. $ gcc/xgcc -B gcc/ ../gcc/testsuite/gcc.dg/pr43300.c ../gcc/testsuite/gcc.dg/pr43300.c: In function ‘foo’: ../gcc/testsuite/gcc.dg/pr43300.c:8:1: internal compiler error: in emit_move_insn, at expr.c:3552 foo (int x, V2SF a) ^ 0x7ee783 emit_move_insn(rtx_def*, rtx_def*) ../../gcc/expr.c:3551 0x84a80b assign_parm_setup_reg ../../gcc/function.c:3322 0x84c4ff assign_parms ../../gcc/function.c:3766 0x84f353 expand_function_start(tree_node*) ../../gcc/function.c:5192 0x6f919f execute ../../gcc/cfgexpand.c:6105
It also breaks a lot of tests on m68k, eg: $ gcc/xgcc -B gcc/ ../gcc/testsuite/gcc.dg/pr17957.c ../gcc/testsuite/gcc.dg/pr17957.c: In function ‘vadd’: ../gcc/testsuite/gcc.dg/pr17957.c:6:1: internal compiler error: in expand_one_stack_var_1, at cfgexpand.c:1221 vadd (void) ^ 0x662055 expand_one_stack_var_1 ../../gcc/cfgexpand.c:1221 0x670685 expand_one_ssa_partition ../../gcc/cfgexpand.c:1295 0x670685 expand_used_vars ../../gcc/cfgexpand.c:1940 0x671e20 execute ../../gcc/cfgexpand.c:6084
I see the same failures on MIPS too.
(In reply to Andreas Schwab from comment #44) Same on sparc.
The errors reported in comments 44, 45, 46, and 47 are fixed in the git branch aoliva/pr64164. I'm giving it all some more testing before posting an updated, consolidated patch.
(In reply to Alexandre Oliva from comment #48) > The errors reported in comments 44, 45, 46, and 47 are fixed in the git > branch aoliva/pr64164. I'm giving it all some more testing before posting > an updated, consolidated patch. I applied your patch (commit 9357ff1, 8/2/15) to our GUPC branch, based off trunk version 226386 on gcc112 (PPC64LE). It bootstrapped fine and passed the tests with "-O0 --enable-checking" and "-O3 --disable-checking".
Author: aoliva Date: Fri Aug 14 18:51:50 2015 New Revision: 226901 URL: https://gcc.gnu.org/viewcvs?rev=226901&root=gcc&view=rev Log: [PR64164] Drop copyrename, use coalescible partition as base when optimizing. for gcc/ChangeLog PR rtl-optimization/64164 PR bootstrap/66978 PR middle-end/66983 PR rtl-optimization/67000 PR middle-end/67034 PR middle-end/67035 * Makefile.in (OBJS): Drop tree-ssa-copyrename.o. * tree-ssa-copyrename.c: Removed. * opts.c (default_options_table): Drop -ftree-copyrename. Add -ftree-coalesce-vars. * passes.def: Drop all occurrences of pass_rename_ssa_copies. * common.opt (ftree-copyrename): Ignore. (ftree-coalesce-inlined-vars): Likewise. * doc/invoke.texi: Remove the ignored options above. * gimple-expr.h (gimple_can_coalesce_p): Move declaration * tree-ssa-coalesce.h: ... here. * tree-ssa-uncprop.c: Include tree-ssa-coalesce.h and other headers required by it. * gimple-expr.c (gimple_can_coalesce_p): Allow coalescing across variables when flag_tree_coalesce_vars. Check register use and promoted modes to allow coalescing. Do not coalesce maybe-byref parms with SSA_NAMEs of other variables, or anonymous SSA_NAMEs. Moved to tree-ssa-coalesce.c. * tree-ssa-live.c (struct tree_int_map_hasher): Move along with its member functions to tree-ssa-coalesce.c. (var_map_base_init): Likewise. Renamed to compute_samebase_partition_bases. (partition_view_normal): Drop want_bases parameter. (partition_view_bitmap): Likewise. * tree-ssa-live.h: Adjust declarations. * tree-ssa-coalesce.c: Include explow.h and cfgexpand.h. (build_ssa_conflict_graph): Process PARM_ and RESULT_DECLs's default defs at the entry point. (dump_part_var_map): New. (compute_optimized_partition_bases): New, called by... (coalesce_ssa_name): ... when flag_tree_coalesce_vars, instead of compute_samebase_partition_bases. Adjust. * alias.c (nonoverlapping_memrefs_p): Disregard gimple-regs. * cfgexpand.c (leader_merge, parm_maybe_byref_p): New. (ssa_default_def_partition): New. (get_rtl_for_parm_ssa_default_def): New. (align_local_variable, add_stack_var): Support anonymous SSA names. (defer_stack_allocation): Likewise. Declare earlier. (set_rtl): Merge exprs and attrs, even for MEMs and non-SSA vars. Update DECL_RTL for PARM_DECLs and RESULT_DECLs too. Do no record deferred-allocation marker in SA.partition_to_pseudo. (expand_stack_vars): Adjust check for the marker in it. (expand_one_stack_var_at): Handle anonymous SSA_NAMEs. Drop redundant MEM attr setting. (expand_one_stack_var_1): Handle anonymous SSA_NAMEs. Renamed from... (expand_one_stack_var): ... this. New wrapper to check and skip already expanded SSA partitions. (record_alignment_for_reg_var): New, factored out of... (expand_one_var): ... this. (expand_one_ssa_partition): New. (adjust_one_expanded_partition_var): New. (expand_one_register_var): Check and skip already expanded SSA partitions. (expand_used_vars): Don't create DECLs for anonymous SSA names. Expand all SSA partitions, then adjust all SSA names. (pass::execute): Replace the loops that set SA.partition_to_pseudo from partition leaders and cleared DECL_RTL for multi-location variables, and that which used to rename vars and set attrs, with one that clears DECL_RTL and checks that PARMs and RESULTs default_defs match DECL_RTL. * cfgexpand.h (get_rtl_for_parm_ssa_default_def): Declare. * emit-rtl.c: Include stor-layout.h. (set_reg_attrs_for_parm): Handle NULL decl. (set_reg_attrs_for_decl_rtl): Take mode from expression if it's not a DECL. * stmt.c (emit_case_decision_tree): Pass it the SSA_NAME rather than its possibly-NULL DECL. * explow.c (promote_ssa_mode): New. * explow.h (promote_ssa_mode): Declare. * expr.c (expand_expr_real_1): Handle anonymous SSA_NAMEs. (read_complex_part): Export. * expr.h (read_complex_part): Declare. * cfgexpand.h (parm_maybe_byref_p): Declare. * function.c: Include cfgexpand.h. (use_register_for_decl): Handle SSA_NAMEs, anonymous or not. (use_register_for_parm_decl): Wrapper for the above to special-case the result_ptr. (rtl_for_parm): Ditto for get_rtl_for_parm_ssa_default_def. (split_complex_args): Take assign_parm_data_all argument. Pass it to rtl_for_parm. Set up rtl and context for split args. Reset complex parm before fetching its default decl rtl. (assign_parms_unsplit_complex): Use the default-def complex parm rtl if it matches the components. (assign_parms_augmented_arg_list): Adjust. (maybe_reset_rtl_for_parm): Reset DECL_RTL of parms with multiple locations. Recognize split complex args. (assign_parm_adjust_stack_rtl): Add all and parm arguments, for rtl_for_parm. For SSA-assigned parms, zero stack_parm. (assign_parm_setup_block): Prefer SSA-assigned location, and fill in its address if the memory location of a maybe-byref parm was not assigned by cfgexpand. (assign_parm_setup_reg): Likewise. Adjust its mode as needed. Use entry_parm for equiv if stack_parm is NULL. Make sure passed_pointer parms don't need conversion. Copy address or value as needed. (assign_parm_setup_stack): Prefer SSA-assigned location. (assign_parms): Maybe reset DECL_RTL of params. Adjust stack rtl before testing for pointer bounds. Special-case result_ptr. (expand_function_start): Maybe reset DECL_RTL of result. Prefer SSA-assigned location for result and static chain. Factor out DECL_RESULT and SET_DECL_RTL. Convert static chain to Pmode if needed, from H.J. Lu <hongjiu.lu@intel.com>. * tree-outof-ssa.c (insert_value_copy_on_edge): Handle anonymous SSA names. Use promote_ssa_mode. (get_temp_reg): Likewise. (remove_ssa_form): Adjust. * stor-layout.c (layout_decl): Don't set mem attributes of non-MEMs. * var-tracking.c (dataflow_set_clear_at_call): Take call_insn and get its reg_usage for reg invalidation. (compute_bb_dataflow): Pass it insn. (emit_notes_in_bb): Likewise. for gcc/testsuite/ChangeLog * gcc.dg/guality/pr54200.c: Add -fno-tree-coalesce-vars. * gcc.dg/ssp-1.c: Make counter a register. * gcc.dg/ssp-2.c: Likewise. * gcc.dg/torture/parm-coalesce.c: New. Added: trunk/gcc/testsuite/gcc.dg/torture/parm-coalesce.c Removed: trunk/gcc/tree-ssa-copyrename.c Modified: trunk/gcc/ChangeLog trunk/gcc/Makefile.in trunk/gcc/alias.c trunk/gcc/cfgexpand.c trunk/gcc/cfgexpand.h trunk/gcc/common.opt trunk/gcc/doc/invoke.texi trunk/gcc/emit-rtl.c trunk/gcc/explow.c trunk/gcc/explow.h trunk/gcc/expr.c trunk/gcc/expr.h trunk/gcc/function.c trunk/gcc/gimple-expr.c trunk/gcc/gimple-expr.h trunk/gcc/opts.c trunk/gcc/passes.def trunk/gcc/stmt.c trunk/gcc/stor-layout.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/guality/pr54200.c trunk/gcc/testsuite/gcc.dg/ssp-1.c trunk/gcc/testsuite/gcc.dg/ssp-2.c trunk/gcc/tree-outof-ssa.c trunk/gcc/tree-ssa-coalesce.c trunk/gcc/tree-ssa-coalesce.h trunk/gcc/tree-ssa-live.c trunk/gcc/tree-ssa-live.h trunk/gcc/tree-ssa-uncprop.c trunk/gcc/var-tracking.c
Fixed for the next major release. Not planning a backport.
Author: aoliva Date: Wed Aug 19 17:00:32 2015 New Revision: 227015 URL: https://gcc.gnu.org/viewcvs?rev=227015&root=gcc&view=rev Log: [PR64164] fix regressions reported on m68k and armeb Defer stack slot address assignment for all parms that can't live in pseudos, and accept pseudos assignments in assign_param_setup_block. for gcc/ChangeLog PR rtl-optimization/64164 * cfgexpand.c (parm_maybe_byref_p): Renamed to... (parm_in_stack_slot_p): ... this. Disregard mode, what matters is whether the parm will live in a pseudo or a stack slot. (expand_one_ssa_partition): Deal with params without a default def. Disregard mode. * cfgexpand.h: Renamed function declaration. * tree-ssa-coalesce.c: Adjust. * function.c (split_complex_args): Allocate stack slot for unassigned parms before splitting. (parm_in_unassigned_mem_p): New. Use it instead of parm_maybe_byref_p throughout this file. (assign_parm_setup_block): Use it. Accept pseudos in the expand-assigned rtl. (assign_parm_setup_reg): Drop BLKmode requirement. (assign_parm_setup_stack): Allocate and fill in the address of unassigned MEM parms. Modified: trunk/gcc/ChangeLog trunk/gcc/cfgexpand.c trunk/gcc/cfgexpand.h trunk/gcc/function.c trunk/gcc/tree-ssa-coalesce.c
Author: aoliva Date: Fri Aug 21 20:03:14 2015 New Revision: 227085 URL: https://gcc.gnu.org/viewcvs?rev=227085&root=gcc&view=rev Log: fix sched compare regression for gcc/ChangeLog PR rtl-optimization/64164 PR rtl-optimization/67227 * alias.c (memrefs_conflict_p): Handle VALUEs in PLUS better. (nonoverlapping_memrefs_p): Test offsets and sizes when given identical gimple_reg exprs. Modified: trunk/gcc/ChangeLog trunk/gcc/alias.c
New patch posted at https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01793.html
Author: aoliva Date: Sun Sep 27 09:02:00 2015 New Revision: 228175 URL: https://gcc.gnu.org/viewcvs?rev=228175&root=gcc&view=rev Log: revert to assign_parms assignments using default defs Revert the fragile and complicated changes to assign_parms designed to enable it to use RTL assigments chosen by cfgexpand, and instead have cfgexpand use the RTL assignments by assign_parms, keying them off of the default defs that are now necessarily introduced for each parm and result. The possible lack of a default def was already a problem, and the fallbacks in place were not enough, as shown by PR67312. We now have checking asserts in set_rtl that verify that we're assigning to each var a piece of RTL that matches the expectations set forth by use_register_for_decl. for gcc/ChangeLog PR rtl-optimization/64164 PR tree-optimization/67312 PR middle-end/67340 PR middle-end/67490 PR bootstrap/67597 * cfgexpand.c (parm_in_stack_slot_p): Remove. (ssa_default_def_partition): Remove. (get_rtl_for_parm_ssa_default_def): Remove. (set_rtl): Check that RTL assignments match expectations. Loop on SUBREGs, CONCATs and PARALLELs subexprs. Set only the default def location for params and results. Record SSA names or types in REG and MEM attrs, respectively. (set_parm_rtl): New. (expand_one_ssa_partition): Drop logic that assigned MEMs with unassigned addresses. (adjust_one_expanded_partition_var): Don't accept NULL RTL on deferred stack alloc vars. (expand_used_vars): Skip partitions holding parm default defs. Move adjust_one_expanded_partition_var loop... (pass_expand::execute): ... here. Drop redundant assert. Adjust comments before the final loop over all ssa names. Require assigned rtl of parms and results to match exactly. Reset its attributes to match them, not any other variables in the same partition. (expand_debug_expr): Use entry value for PARM's default defs only iff they have zero nondebug uses. * cfgexpand.h (parm_in_stack_slot_p): Remove. (get_rtl_for_parm_ssa_default_def): Remove. (set_parm_rtl): Declare. * doc/invoke.texi: Improve wording. * explow.c (promote_decl_mode): Fix promote_function_mode for result decls not by reference. (promote_ssa_mode): Disregard BLKmode from promote_decl, and bypass TYPE_MODE to get the actual vector mode. * function.c: Include tree-dfa.h. Revert 2015-08-14's and 2015-08-19's changes as follows. Drop include of basic-block.h and df.h. (rtl_for_parm): Remove. (maybe_reset_rtl_for_parm): Remove. (parm_in_unassigned_mem_p): Remove. (use_register_for_decl): Add logic for RESULT_DECLs matching assign_parms' behavior. (split_complex_args): Revert. (assign_parms_augmented_arg_list): Revert. Add comment referencing the logic above. (assign_parm_adjust_stack_rtl): Revert. (assign_parm_setup_block): Revert. Use set_parm_rtl instead of SET_DECL_RTL. Set up a REG if the parm demands so. (assign_parm_setup_reg): Revert. Consolidated SET_DECL_RTL calls into a single set_parm_rtl. Set up a temporary RTL temporarily for expand_assignment. (assign_parm_setup_stack): Revert. Use set_parm_rtl. (assign_parms_unsplit_complex): Revert. Use set_parm_rtl. (assign_bounds): Revert. (assign_parms): Revert. Use set_parm_rtl. (allocate_struct_function): Relayout result and parms of non-abstruct functions. (expand_function_start): Revert. Use set_parm_rtl. If the result is not a hard reg, create a pseudo from the promoted mode of the default def. Promote static chain mode. * tree-outof-ssa.c (remove_ssa_form): Drop unused partition_has_default_def. Set up partitions_for_parm_default_defs. (finish_out_of_ssa): Remove partition_has_default_def. Release partitions_for_parm_default_defs. * tree-outof-ssa.h (struct ssaexpand): Remove partition_has_default_def. Add partitions_for_parm_default_defs. * tree-ssa-coalesce.c: Include tree-dfa.h, tm_p.h and stor-layout.h. (build_ssa_conflict_graph): Fix conflict-detection of default defs of even unused default defs of params and results. (for_all_parms): New. (create_default_def): New. (register_default_def): New. (coalesce_with_default): New. (create_outofssa_var_map): Create default defs for all parms and results, and register their partitions. Add GIMPLE_RETURN operands as coalesce candidates with results. Add default defs of each parm or result as coalesce candidates with its other defs. Mark each result def, and each default def of parms, as used_in_copy. (gimple_can_coalesce_p): Call it. Call use_register_for_decl with the ssa names, even anonymous ones. Drop parm_in_stack_slot_p calls. Require same signedness and alignment. (coalesce_ssa_name): Add coalesce candidates for all defs of each parm and result, even unused ones. (parm_default_def_partition_arg): New type. (set_parm_default_def_partition): New. (get_parm_default_def_partitions): New. * tree-ssa-coalesce.h (get_parm_default_def_partitions): New. * tree-ssa-live.c (partition_view_init): Regard unused defs of parms and results as used. (verify_live_on_entry): Don't error out just because they're not live. for gcc/testsuite/ChangeLog PR rtl-optimization/64164 PR tree-optimization/67312 * gcc.dg/pr67312.c: New. From Zdenek Sojka. * gcc.target/i386/stackalign/return-4.c: Add -O. Added: trunk/gcc/testsuite/gcc.dg/pr67312.c Modified: trunk/gcc/ChangeLog trunk/gcc/cfgexpand.c trunk/gcc/cfgexpand.h trunk/gcc/doc/invoke.texi trunk/gcc/explow.c trunk/gcc/function.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/i386/stackalign/return-4.c trunk/gcc/tree-outof-ssa.c trunk/gcc/tree-outof-ssa.h trunk/gcc/tree-ssa-coalesce.c trunk/gcc/tree-ssa-coalesce.h trunk/gcc/tree-ssa-live.c
Author: aoliva Date: Fri Nov 6 10:34:13 2015 New Revision: 229840 URL: https://gcc.gnu.org/viewcvs?rev=229840&root=gcc&view=rev Log: [PR67753] fix copy of PARALLEL entry_parm to CONCAT target_reg In assign_parms_setup_block, the copy of args in PARALLELs from entry_parm to stack_parm is deferred to the parm conversion insn seq, but the copy from stack_parm to target_reg was inserted in the normal copy seq, that is executed before the conversion insn seq. Oops. We could do away with the need for an actual stack_parm in general, which would have avoided the need for emitting the copy to target_reg in the conversion seq, but at least on pa, due to the need for stack to copy between SI and SF modes, it seems like using the reserved stack slot is beneficial, so I put in logic to use a pre-reserved stack slot when there is one, and emit the copy to target_reg in the conversion seq if stack_parm was set up there. for gcc/ChangeLog PR rtl-optimization/67753 PR rtl-optimization/64164 * function.c (assign_parm_setup_block): Avoid allocating a stack slot if we don't have an ABI-reserved one. Emit the copy to target_reg in the conversion seq if the copy from entry_parm is in it too. Don't use the conversion seq to copy a PARALLEL to a REG or a CONCAT. Modified: trunk/gcc/ChangeLog trunk/gcc/function.c
Author: aoliva Date: Thu Nov 26 21:57:40 2015 New Revision: 230985 URL: https://gcc.gnu.org/viewcvs?rev=230985&root=gcc&view=rev Log: [PR67753] adjust for padding when bypassing memory in assign_parm_setup_block Storing a register in memory as a full word and then accessing the same memory address under a smaller-than-word mode amounts to right-shifting of the register word on big endian machines. So, if BLOCK_REG_PADDING chooses upward padding for BYTES_BIG_ENDIAN, and we're copying from the entry_parm REG directly to a pseudo, bypassing any stack slot, perform the shifting explicitly. This fixes the miscompile of function_return_val_10 in gcc.target/aarch64/aapcs64/func-ret-4.c for target aarch64_be-elf introduced in the first patch for 67753. for gcc/ChangeLog PR rtl-optimization/67753 PR rtl-optimization/64164 * function.c (assign_parm_setup_block): Right-shift upward-padded big-endian args when bypassing the stack slot. Modified: trunk/gcc/ChangeLog trunk/gcc/function.c
The patch in comment 35 causes a performance regression on s390. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68695
I believe PR70164 is related to this.