64164 – [4.9/5/6 Regression] one more stack slot used due to one less inlining level

Bug 64164 - [4.9/5/6 Regression] one more stack slot used due to one less inlining level

Summary: [4.9/5/6 Regression] one more stack slot used due to one less inlining level

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	rtl-optimization (show other bugs)
Version:	5.0

Importance:	P2 normal
Target Milestone:	6.0
Assignee:	Alexandre Oliva

URL:
Keywords:	missed-optimization

Depends on:	66978 66983 67000 67034 67035 67227 67295 67312 67340 67490 67597 67753 67766 67891
Blocks:
	Show dependency tree / graph

Reported:	2014-12-03 09:33 UTC by Patrick Marlier
Modified:	2021-12-19 04:24 UTC (History)
CC List:	11 users (show)

See Also:	64082 79124
Host:	x86_64-linux-gnu
Target:	x86_64-linux-gnu
Build:	x86_64-linux-gnu
Known to work:	4.8.3
Known to fail:	4.9.1, 5.0
Last reconfirmed:

Attachments
testcase (836 bytes, text/x-csrc) 2014-12-03 09:33 UTC, Patrick Marlier	Details
Patch that restores coalescing of anonymous SSA vars (696 bytes, patch) 2015-03-17 22:18 UTC, Alexandre Oliva	Details \| Diff
ARM testcase (17.47 KB, text/plain) 2015-06-10 13:15 UTC, Christophe Lyon	Details
AIX PowerPC testcase (16.53 KB, text/plain) 2015-06-10 14:43 UTC, David Edelsohn	Details
PPC64LE Linux Testcase (14.50 KB, text/plain) 2015-06-10 14:45 UTC, David Edelsohn	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Patrick Marlier 2014-12-03 09:33:34 UTC

Created attachment 34178 [details]
testcase

$ cc -O2 -S -o stm.s stm.c
$ cc -O2 -DOPT -S -o stm-opt.s stm.c

if you compare the 2 outputs, stm_load function is using one more slot on the stack.
The difference is only this:

static inline size_t
AO_myload2(const volatile size_t *addr)
{
  return *(size_t *)addr;
}
static inline size_t
AO_myload(const volatile size_t *addr)
{
#ifdef OPT
  size_t result = AO_myload2(addr);
#else
  size_t result = *(size_t *)addr;
#endif
  return result;
}

Having one more inlined function should have the same optimization not a better one.
4.8 does not have the problem and the code generated is the same.

Comment 1 Andrew Pinski 2014-12-03 09:44:52 UTC

Looks like out of SSA is producing the difference.
Change the function to just:
  return *(size_t *)addr;

Changes it to what the inline function does.

Comment 2 Andrew Pinski 2014-12-03 09:45:12 UTC

Confirmed.

Comment 3 Richard Biener 2014-12-03 10:03:44 UTC

The difference is in whether there are extra user-named variables in the end
and thus SSA coalescing decision differences:

 stm_load (volatile stm_word_t * addr)
 {
-  stm_word_t l;
-  stm_word_t value;
   stm_word_t version;
   stm_word_t l;
   struct r_entry_t * r;
-  stm_word_t now;
...
+  size_t _32;
+  size_t _33;
+  size_t _34;

...

 Conflict graph:
+1: 3
+3: 1
 
 After sorting:
 Sorted Coalesce list:
+(16610) _30 <-> _33
 (651) _10 <-> _30

...

-Coalesce list: (10)_10 & (30)_30 [map: 1, 2] : Success -> 1
+Coalesce list: (30)_30 & (33)_33 [map: 2, 3] : Success -> 2
+Coalesce list: (10)_10 & (30)_30 [map: 1, 2] : Fail due to conflict


So it turns out the different coalescing ends up generating worse code.
It would be interesting to see why we decide that coalescing _30 and _33
is so much more beneficial than coalescing _10 and _30.

Ah, it simply uses EDGE_FREQUENCY...  and for some reason we predicted
that _33 & 1 != 0 is 10% taken only.

So ... the theory is that the version is faster on the important path?

Comment 4 Richard Biener 2014-12-03 10:06:30 UTC

That is, it is good

  # _30 = PHI <0(2), _10(8), value_33(7), value_33(3), value_33(6)>
  return _30;

vs bad

  # _30 = PHI <0(2), _10(8), _33(7), _33(3), _33(6)>
  return _30;

where it thinks that coalescing _30 and _33 is more important than
coalescing _30 and _10 (which looks reasonable).

The question is why RTL opts make such large difference out of this.

Comment 5 Patrick Marlier 2014-12-17 11:38:05 UTC

The regression starts from this commit:

trunk@200103
commit f82f0ea592c2d78077e03f5a1a3b9b40751cc116
Author: law <law@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Fri Jun 14 18:52:32 2013 +0000

        * gimple.h (gimple_can_coalesce_p): Prototype.
        * tree-ssa-coalesce.c (gimple_can_coalesce_p): New function.
        (create_outofssa_var_map, coalesce_partitions): Use it.
        * tree-ssa-uncprop.c (uncprop_into_successor_phis): Similarly.
        * tree-ssa-live.c (var_map_base_init): Use TYPE_CANONICAL
        if it's available.

        * gcc.dg/tree-ssa/coalesce-1.c: New test.

And especially from this:
gcc/tree-ssa-coalesce.c:gimple_can_coalesce_p (tree name1, tree name2)
...

  /* If the types are not the same, check for a canonical type match.  This
     (for example) allows coalescing when the types are fundamentally the
     same, but just have different names. 

     Note pointer types with different address spaces may have the same
     canonical type.  Those are rejected for coalescing by the
     types_compatible_p check.  */

  if (TYPE_CANONICAL (t1)
      && TYPE_CANONICAL (t1) == TYPE_CANONICAL (t2)
      && types_compatible_p (t1, t2))
    return true;

But as Richard mentioned, I think the problem is in the RTL part.

Comment 6 Alexandre Oliva 2015-03-17 17:18:47 UTC

The problem starts in the first copyrename pass.  We refrain from coalescing variables if neither has a usable root.  This causes us to fail to coalesce partitions that we would coalesce if only we tried them in a different order (i.e., P1 and P2 fail but either one coalesces with P3, and then the other would coalesce with that union).  This is exactly what happens in AO_myload with -DOPT: partitions _3 and _6 won't coalesce, and then result_4 and _6 coalesce, which would enable _3 to coalesce too.  This in turn brings additional rootless variables into stm_load, which again fail to coalesce because we try them first (_36 and _4 fails, and l_11 and _36 passes, so that with _4 would work).  Without -DOPT, we succeed in coalescing result (there are fewer SSA names), and when that is inlined, it gets there with a root symbol, so that coalescing of _36 and result_4 succeeds.  In both cases, we coalesce l_11 with result_36, and it is precisely because _4 remains separate that we end up introducing copies in new blocks when going out of SSA into RTL.

Coalescing SSA names even when neither has a root removes the optimization differences for this testcase.

Comment 7 Jeffrey A. Law 2015-03-17 18:19:29 UTC

OK, but why does this make such a difference in the final result.  Ie, what happens as we get into RTL.

It would also be helpful to see the current & desired output for the case where we've regressed.

Comment 8 Andrew Macleod 2015-03-17 19:55:31 UTC

(In reply to Alexandre Oliva from comment #6)

> 
> Coalescing SSA names even when neither has a root removes the optimization
> differences for this testcase.

I would worry about side effects... Coalescing is always a balancing act.   If you allow things to coalesce when neither has a root, a few of those coalesced together may prevent a coalesce with a much more desirable root in other programs.  

And of course more coalesces may mean longer live ranges...  so any changes like this need a lot of studying of side effects on a much larger set of problems. 

It was a long time ago, so I don't remember the reasons, but most of these kinds of restrictions were rooted in some kind of rationale... usually from examining an issue of some sort and introducing the restriction to avoid it.  The original conditions may no longer exist, but we'd have to run tests to make sure nothing shows up.  Perhaps tweaking the costing model could help too.

Comment 9 Alexandre Oliva 2015-03-17 22:15:33 UTC

(In reply to Jeffrey A. Law from comment #7)
> OK, but why does this make such a difference in the final result.  Ie, what happens as we get into RTL.

Err, I covered that bit in my description: we emit a number of copies, each with a new BB of its own.  The additional pseudo survives all the way to the end, and so do the copies, which is enough to explain the additional stack slot.  Further rearrangement of code also follows from the additional blocks.

(In reply to Andrew Macleod from comment #8)
> most of these kinds of restrictions were rooted in some kind of rationale... usually from examining an issue of some sort and introducing the restriction to avoid it.

This one was added by https://gcc.gnu.org/ml/gcc-patches/2012-08/msg00517.html
The description mentions out-of-SSA coalescing, but not copyrename coalescing, which is what this is about.  Since there were not anonymous SSA names before, the introduced condition would necessarily not be met, so it effectively reduced the amount of copyrename coalescing without any rationale AFAICT.

Richi, do you happen to have any insight as to why you changed copyrename to avoid coalescing rootless ssa names?  Reverting it doesn't seem to have any ill effects on code correctness (unlike allowing coalescing between rooted and rootless vars during out-of-ssa, in gimple_can_coalesce_p, which breaks codegen badly, though I haven't dug into why).  I'll attach a patch I'm testing to that effect momentarily.

Comment 10 Alexandre Oliva 2015-03-17 22:18:01 UTC

Created attachment 35048 [details]
Patch that restores coalescing of anonymous SSA vars

Comment 11 Alexandre Oliva 2015-03-17 22:19:30 UTC

Uhh, Richi, there's a question for you in comment 9, but I forgot to cc: you before saving the post.

Comment 12 Alexandre Oliva 2015-03-17 23:01:28 UTC

I just noticed that I reversed with and without -DOPT in my analysis in comment 6.  Apologies for any confusion this may have caused.

Comment 13 Alexandre Oliva 2015-03-18 00:20:31 UTC

I looked further into why changing gimple_can_coalesce_p didn't work: build_ssa_conflict_graph only marks conflicts between SSA names if they share the same base variable.  This explains why we have a conflict in the conflict map with -DOPT, in which l_4, values_29 and now_30 all have base variables, whereas without -DOPT they don't.  So, we can't coalesce them with _10 and _28, even though 10 and 29 do conflict.

I ran a test in which I allowed values_29 and _28 to coalesce, by hooking into gimple_can_coalesce_p during an -UOPT out-of-ssa and getting it to return true.  This yielded the same partition map as the -DOPT case, but an empty conflict map for the reason above.  The sorted coalesce list was the same, too, so we coalesced values_29 and _28, as in -DOPT, but then, due to the lack of the conflict, we also coalesced _10 into the same partition.  This experimental additional invalid coalescing in turn caused other differences in the generated RTL.

At this point I'll sum up my findings:

- we fail to coalesce during copyrename because we've ruled out, at that point, coalescing of anonymous SSA names.  as a consequence, only some coalescing opportunities are taken, leaving some anonymous names that we try to coalesce first not coalesced

- as a consequence, we fail to coalesce some SSA names because some of the SSA variables have become anonymous whereas others aren't

- as a consequence, we emit additional copies in additional BBs, and they survive all the way to the end of compilation

Although changing copyrename to allow coalescing in these cases fixes the problem, I suppose it would also be possible to change out-of-ssa to compensate; we would have to somehow mark conflicts between anonymous and non-anonymous variables in the conflict graph, though.

Comment 14 Jeffrey A. Law 2015-03-18 05:48:46 UTC

So, forgive me, but is -DOPT supposed to be the good or the bad code?

From looking at the results I get, -DOPT is supposed to be the good code, but that seems to conflict with c#6.

For -DOPT I get this PHI:

  # _28 = PHI <0(2), _29(3), _29(7), _10(8), _29(6)>

For -UOPT I get this:

  # _28 = PHI <0(2), value_29(3), value_29(7), _10(8), value_29(6)>


Note that _10 from BB 8 is known to have the value 0.  That equivalency comes from the path 2->3->4->8.  Coalescing _10 with anything is probably the least interesting.  Coalescing _28 and _29 seems more profitable.

Would we get better code if we left the constant in place?  Can we come up with a reasonable heuristic to determine if we're better off with _10 or the constant in that PHI argument?

Right now uncprop will always replace a constant with an SSA_NAME having the right value if the SSA_NAME could potentially coalesce with the result of the PHI.  The theory being that a constant will always need initialization, but if the SSA_NAME coalesces with the result, then the SSA_NAME will likely be "free".  The obvious exception is when that coalescing prevents other things from coalescing.


I also wonder if we could be cleaning things up in RTL to make this less of an issue.  For example, would optimizing the obvious tail-merge/crossjump earlier in the RTL pipeline help the register allocator coalesce things better?

Comment 15 Richard Biener 2015-03-18 09:02:13 UTC

(In reply to Alexandre Oliva from comment #9)
> (In reply to Jeffrey A. Law from comment #7)
> > OK, but why does this make such a difference in the final result.  Ie, what happens as we get into RTL.
> 
> Err, I covered that bit in my description: we emit a number of copies, each
> with a new BB of its own.  The additional pseudo survives all the way to the
> end, and so do the copies, which is enough to explain the additional stack
> slot.  Further rearrangement of code also follows from the additional blocks.
> 
> (In reply to Andrew Macleod from comment #8)
> > most of these kinds of restrictions were rooted in some kind of rationale... usually from examining an issue of some sort and introducing the restriction to avoid it.
> 
> This one was added by
> https://gcc.gnu.org/ml/gcc-patches/2012-08/msg00517.html
> The description mentions out-of-SSA coalescing, but not copyrename
> coalescing, which is what this is about.  Since there were not anonymous SSA
> names before, the introduced condition would necessarily not be met, so it
> effectively reduced the amount of copyrename coalescing without any
> rationale AFAICT.
> 
> Richi, do you happen to have any insight as to why you changed copyrename to
> avoid coalescing rootless ssa names?

Because "coalescing" them doesn't do anything.  copyrename coalescing
assigns the same underlying DECL to SSA names, thus makes SSA_NAME_VAR
the same.  But when both SSA_NAME_VARs are NULL this won't do anything.

So the effect of the patch was rather that there are now way less SSA names
with SSA_NAME_VAR != NULL.  In particular all the pass created ones
(with no relation to user declared variables).  It didn't make sense to
apply copyrename to those (copyrename is to have better debug info in the end,
sth which should be provided by debug stmts today).

>  Reverting it doesn't seem to have any
> ill effects on code correctness (unlike allowing coalescing between rooted
> and rootless vars during out-of-ssa, in gimple_can_coalesce_p, which breaks
> codegen badly, though I haven't dug into why).  I'll attach a patch I'm
> testing to that effect momentarily.

So I'd say the patch can't make a difference - it at most can end up
re-writing the type of an SSA name and then having followup effects
in out-of-SSA coalescing?  For some odd reason we are ignoring some
type differences in copyrename coalescing.

But - how does either root1 or root2 end up non-zero at

  /* Set the root variable of the partition to the better choice, if there is
     one.  */
  if (!ign2 && root2)
    replace_ssa_name_symbol (partition_to_var (map, p3), root2);
  else if (!ign1 && root1) 
    replace_ssa_name_symbol (partition_to_var (map, p3), root1);
  else
    gcc_unreachable ();     

?  That is, the only effect of your patch is to do partiton_union?  Ah,
and then go to:

  /* Now one more pass to make all elements of a partition share the same
     root variable.  */

  for (x = 1; x < num_ssa_names; x++)
    {
      part_var = partition_to_var (map, x);
      if (!part_var)
        continue;
      var = ssa_name (x);
      if (SSA_NAME_VAR (var) == SSA_NAME_VAR (part_var))
        continue;
      if (debug)
        {
          fprintf (debug, "Coalesced ");
          print_generic_expr (debug, var, TDF_SLIM);
          fprintf (debug, " to ");
          print_generic_expr (debug, part_var, TDF_SLIM);
          fprintf (debug, "\n");
        }
      stats.coalesced++;
      replace_ssa_name_symbol (var, SSA_NAME_VAR (part_var));
    }

and wreck SSA names type (and "debug" identifier) only?  That really sounds
bogus.

The correct fix would be to allow the kind of type differences copyrename
coalescing allows also in out-of-SSA coalescing.

That is, I don't see how the patch makes sense.

Comment 16 Richard Biener 2015-03-18 09:03:50 UTC

And as I analyzed in comment #3 we chose the now different coalescing because
it is more profitable (to the cost analysis we perform in out-of-SSA coalescing).
So the fix, if any, is there (or in the code maintaining edge frequencies).

Comment 17 Richard Biener 2015-03-18 09:19:21 UTC

To add to all this - IMHO copyrename should go - it's purpose was to have more
SSA names with user-decls and thus debug info for them.  This should now
be dealt with debug insns (in a way better and correct way).

Yes, out-of-SSA coalescing could be changed to allow coalescing of SSA names
with a user-DECL and anonymous SSA names (or SSA names with a DECL_IGNORED_P
decl).  But that will make the conflict graph much larger(?).  The decision past
then was to retain previous behavior which wouldn't have coalesced this
case (because the DECL of the now anonymous SSA name was different).

In theory we could just go for it, hoping the conflict graph will not get
much larger.  But it might happen that we end up coalescing things in a way
that there ends up being no DECL for a register and thus we lose in debug
quality (not sure if we could compensate by inserting debug stmts on the
edges to compensate for that - we couldn't do that if it requires splitting
the edge).  Not sure if always coalescing to the variable with the user-decl
won't make debug info wrong, like in

  bb1:
  _1 = 2 * _3;
  goto <bb 3>;

  bb2:
  i_4 = 5 / _3;
  
  bb3:
  _3 = PHI <i_4 (2), _1  (1)>

if we coalesce to _1 to i_4 then the expression 2 * _3 will be computed
to a register with decl 'i'.

So to avoid wrong debug we'd have to always coalesce to an anonymous
entity - but that would lose the association of 'i' to 5 / _3 in bb2
unless we manage to compensate for that somehow with debug-insns we
magically emit during RTL expansion.

Comment 18 Alexandre Oliva 2015-03-18 10:05:33 UTC

(In reply to Jeffrey A. Law from comment #14)
> So, forgive me, but is -DOPT supposed to be the good or the bad code?

It's the good one.  As noted in comment 12, I had that backwards in comment 6.

(In reply to Richard Biener from comment #15)
> Because "coalescing" them doesn't do anything.  copyrename coalescing
> assigns the same underlying DECL to SSA names, thus makes SSA_NAME_VAR
> the same.  But when both SSA_NAME_VARs are NULL this won't do anything.

This may be true when you look at only one pair of SSA names, but when you have 2 such candidate pairs involving one common SSA name, it can, and it does make a difference, as described in comment 6.

(In reply to Richard Biener from comment #16)
> And as I analyzed in comment #3 we chose the now different coalescing because
> it is more profitable (to the cost analysis we perform in out-of-SSA
> coalescing).

No, we don't even *consider* the coalescing performed in the -DOPT case, because, as noted in comment 13, the SSA names ended up with different base names, because copyrename wouldn't give them the same base name.

(In reply to Richard Biener from comment #17)
> To add to all this - IMHO copyrename should go

That's fine with me.

> Yes, out-of-SSA coalescing could be changed to allow coalescing of SSA names
> with a user-DECL and anonymous SSA names (or SSA names with a DECL_IGNORED_P
> decl).  But that will make the conflict graph much larger(?).

Ok, I'll give that a shot tomorrow (I'll be away for the whole day today).

> might happen that we end up coalescing things in a way
> that there ends up being no DECL for a register and thus we lose in debug
> quality (not sure if we could compensate by inserting debug stmts on the
> edges to compensate for that - we couldn't do that if it requires splitting
> the edge).

I'm not concerned about that.  Any debug stmts needed to make this work will already be in place, after the actual assignments, and after early PHI nodes.  Sure, edge insertions might separate the copies from the debug stmts inserted after the PHI nodes, but ultimately the bindings should take care of it.

> So to avoid wrong debug we'd have to always coalesce to an anonymous
> entity

With debug stmts, we don't care what the base names are any more.  All the info we need (for tracked auto variables) is in the debug stmts.

Comment 19 rguenther@suse.de 2015-03-18 10:16:38 UTC

On Wed, 18 Mar 2015, aoliva at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164
> 
> --- Comment #18 from Alexandre Oliva <aoliva at gcc dot gnu.org> ---
> (In reply to Jeffrey A. Law from comment #14)
> > So, forgive me, but is -DOPT supposed to be the good or the bad code?
> 
> It's the good one.  As noted in comment 12, I had that backwards in comment 6.
> 
> (In reply to Richard Biener from comment #15)
> > Because "coalescing" them doesn't do anything.  copyrename coalescing
> > assigns the same underlying DECL to SSA names, thus makes SSA_NAME_VAR
> > the same.  But when both SSA_NAME_VARs are NULL this won't do anything.
> 
> This may be true when you look at only one pair of SSA names, but when you have
> 2 such candidate pairs involving one common SSA name, it can, and it does make
> a difference, as described in comment 6.
> 
> (In reply to Richard Biener from comment #16)
> > And as I analyzed in comment #3 we chose the now different coalescing because
> > it is more profitable (to the cost analysis we perform in out-of-SSA
> > coalescing).
> 
> No, we don't even *consider* the coalescing performed in the -DOPT case,
> because, as noted in comment 13, the SSA names ended up with different base
> names, because copyrename wouldn't give them the same base name.
> 
> (In reply to Richard Biener from comment #17)
> > To add to all this - IMHO copyrename should go
> 
> That's fine with me.
> 
> > Yes, out-of-SSA coalescing could be changed to allow coalescing of SSA names
> > with a user-DECL and anonymous SSA names (or SSA names with a DECL_IGNORED_P
> > decl).  But that will make the conflict graph much larger(?).
> 
> Ok, I'll give that a shot tomorrow (I'll be away for the whole day today).
> 
> > might happen that we end up coalescing things in a way
> > that there ends up being no DECL for a register and thus we lose in debug
> > quality (not sure if we could compensate by inserting debug stmts on the
> > edges to compensate for that - we couldn't do that if it requires splitting
> > the edge).
> 
> I'm not concerned about that.  Any debug stmts needed to make this work will
> already be in place, after the actual assignments, and after early PHI nodes. 
> Sure, edge insertions might separate the copies from the debug stmts inserted
> after the PHI nodes, but ultimately the bindings should take care of it.
> 
> > So to avoid wrong debug we'd have to always coalesce to an anonymous
> > entity
> 
> With debug stmts, we don't care what the base names are any more.  All the info
> we need (for tracked auto variables) is in the debug stmts.

But we do not always have debug stmts!

int bar (int b)
{
  int i;
  if (b > 7)
    i = b / 3;
  else
    return b * 4;
  return i;
}

is a testcase for what I am thinking of.  Hmm.  For some reason
into-SSA inserts a debug stmt, creating an extra copy _5 = i_4?!

  <bb 2>:
  if (b_2(D) > 7)
    goto <bb 3>;
  else
    goto <bb 4>;

  <bb 3>:
  i_4 = b_2(D) / 3;
  # DEBUG i => i_4
  _5 = i_4;
  goto <bb 5>;

  <bb 4>:
  _3 = b_2(D) * 4;

  <bb 5>:
  # _1 = PHI <_3(4), _5(3)>

so indeed we shouldn't lose anything here.  But are you sure
that we are never using REG_EXPR for debug?  With 
-fno-var-tracking-assignments for sure we do (I suppose we don't care),
at least var-tracking.c does look at REG_EXPR.

Comment 20 Alexandre Oliva 2015-03-18 10:18:36 UTC

(In reply to Alexandre Oliva from comment #18)
> No, we don't even *consider* the coalescing performed in the -DOPT case,
> because, as noted in comment 13, the SSA names ended up with different base
> names, because copyrename wouldn't give them the same base name.
> 
> (In reply to Richard Biener from comment #17)
> > To add to all this - IMHO copyrename should go
> 
> That's fine with me.
> 
> > Yes, out-of-SSA coalescing could be changed to allow coalescing of SSA names
> > with a user-DECL and anonymous SSA names (or SSA names with a DECL_IGNORED_P
> > decl).  But that will make the conflict graph much larger(?).

Not only that: unless we allow also coalescing between different SSA names in out-of-ssa, we'll lose optimizations that copyrename would have enabled out-of-SSA to make, because the removed copyrename won't have assigned the same base name to otherwise unrelated SSA names that out-of-SSA won't even consider for coalescing.  Now *this* might make the conflict graph explode in size.

Comment 21 Alexandre Oliva 2015-03-19 05:17:10 UTC

(In reply to rguenther@suse.de from comment #19)
> For some reason
> into-SSA inserts a debug stmt, creating an extra copy _5 = i_4?!

I bet the extra copy is there even without -g.  _1, _3 and _5 seem to be SSA names for the result decl.  Presumably _5 might be optimized out, replaced with i_4 in the _1 PHI node, at a later pass.

> But are you sure
> that we are never using REG_EXPR for debug?

We use them for non-gimple_register auto variables.

> With 
> -fno-var-tracking-assignments for sure we do (I suppose we don't care),

Yeah, I'd say we don't care.  If one asks for debug info degradation, we obey ;-)
(now, sometimes we disable VTA on our own because some internal limit is exceeded; that's unfortunate, but what are you gonna do? :-)

Comment 22 Jeffrey A. Law 2015-03-20 05:12:01 UTC

Let's try to stay focused.  Killing copyrename seems like a fine thing to do as long as the resulting debug info is good.  But that's independent of this BZ.

I still find myself wondering if leaving the "0" instead of "_10" in that PHI node is a reasonable approach.  Certainly if I hack uncprop to leave things in that state, I get the code we want.

And ISTM that uncprop ought to leave the constant alone if the SSA_NAME holding the constant conflicts with any of the other SSA_NAMEs in the PHI node that may potentially coalesce with the PHI result.  That captures pretty well the case where the constant is better than an SSA_NAME.

In this particular case, we have:

  # _28 = PHI <0(2), _29(3), _29(7), _10(8), _29(6)>

When _28 and _10 coalesce, the result then conflicts with _29 during IRA/LRA because of the extended lifetime of _10).  Thus the annoying copies created by out-of-ssa can't be eliminated.

WIth the proposed change, we'd instead have:

  # _28 = PHI <0(2), _29(3), _29(7), 0(8), _29(6)>

While we won't coalesce _28/_29 during out-of-ssa, LRA will see the copies and note that the associated pseudos don't conflict and ultimately assign them to the same hard register and the annoying copies are gone.

Comment 23 rguenther@suse.de 2015-03-20 09:53:03 UTC

On Fri, 20 Mar 2015, law at redhat dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164
> 
> --- Comment #22 from Jeffrey A. Law <law at redhat dot com> ---
> Let's try to stay focused.  Killing copyrename seems like a fine thing to do as
> long as the resulting debug info is good.  But that's independent of this BZ.
> 
> I still find myself wondering if leaving the "0" instead of "_10" in that PHI
> node is a reasonable approach.  Certainly if I hack uncprop to leave things in
> that state, I get the code we want.
> 
> And ISTM that uncprop ought to leave the constant alone if the SSA_NAME holding
> the constant conflicts with any of the other SSA_NAMEs in the PHI node that may
> potentially coalesce with the PHI result.  That captures pretty well the case
> where the constant is better than an SSA_NAME.
> 
> In this particular case, we have:
> 
>   # _28 = PHI <0(2), _29(3), _29(7), _10(8), _29(6)>
> 
> When _28 and _10 coalesce, the result then conflicts with _29 during IRA/LRA
> because of the extended lifetime of _10).  Thus the annoying copies created by
> out-of-ssa can't be eliminated.
> 
> WIth the proposed change, we'd instead have:
> 
>   # _28 = PHI <0(2), _29(3), _29(7), 0(8), _29(6)>
> 
> While we won't coalesce _28/_29 during out-of-ssa, LRA will see the copies and
> note that the associated pseudos don't conflict and ultimately assign them to
> the same hard register and the annoying copies are gone.

Sure - but then we'd better integrate uncprop into the coalescing,
otherwise you need to re-compute conflicts / live twice.

Comment 24 Jeffrey A. Law 2015-03-20 20:14:09 UTC

Integrating uncprop and out-of-ssa certainly crossed my mind when I wrote that comment :-)  Or at least arranging to build the life info/conflict stuff once and have it shared.

A poor man's version is to note the path for the equivalence and see if any of the blocks on that path define the other PHI arguments.  That ought to be fast and simple enough to possibly include in gcc5.

Comment 25 Alexandre Oliva 2015-03-27 18:12:58 UTC

WIP patch accidentally posted to gcc-patches: https://gcc.gnu.org/ml/gcc-patches/2015-03/msg01460.html

Comment 26 Alexandre Oliva 2015-03-29 20:27:54 UTC

Patch posted at https://gcc.gnu.org/ml/gcc-patches/2015-03/msg01491.html

Comment 27 Patrick Marlier 2015-03-30 21:34:30 UTC

I confirm that the patch fixes the performance problem that I had. I guess the patch is too complex to be backported.

Thanks a lot Alexandre for the patch and to all for the deep analysis!
(just waiting for review and commit :))

Comment 28 Jeffrey A. Law 2015-03-30 21:40:10 UTC

So I've been thinking about how to integrate life/conflict analysis into the uncprop code and it may not be that bad, both from an implementation and computation standpoint.

Most importantly, we don't have to compute full life information.  We really just need to compute the life of the equivalence.  Given the life of the equivalence, if the equivalence is live in any block that contains the defining statement for an SSA_NAME appearing in the target PHI, then the equivalence conflicts and we don't want to unpropagate it.

Computing the life of the equivalence is pretty easy and should be reasonably quick.  This is a cost we'd have to pay regardless of whether or not we integrate uncprop with out-of-ssa since we won't have life information for the expression.

Collecting the SSA_NAMEs appearing on the RHS of the PHI so that we don't test for conflicts multiple times if an SSA_NAME shows up in multiple PHI alternatives would help keep the cost down as well.

Ultimately I don't think we need to integrate uncprop and out-of-ssa to avoid the unprofitable transformation during uncprop.

Comment 29 Jeffrey A. Law 2015-03-30 21:45:57 UTC

No decision has been made on whether or not to include either or both approaches to fixing this BZ into GCC 5.  It's still under evaluation/review.  I think for GCC 6 it's highly likely we'll have both since they're both independently good things to do.

Comment 30 Richard Biener 2015-03-31 11:25:48 UTC

(In reply to Jeffrey A. Law from comment #28)
> So I've been thinking about how to integrate life/conflict analysis into the
> uncprop code and it may not be that bad, both from an implementation and
> computation standpoint.
> 
> Most importantly, we don't have to compute full life information.  We really
> just need to compute the life of the equivalence.  Given the life of the
> equivalence, if the equivalence is live in any block that contains the
> defining statement for an SSA_NAME appearing in the target PHI, then the
> equivalence conflicts and we don't want to unpropagate it.
> 
> Computing the life of the equivalence is pretty easy and should be
> reasonably quick.  This is a cost we'd have to pay regardless of whether or
> not we integrate uncprop with out-of-ssa since we won't have life
> information for the expression.
> 
> Collecting the SSA_NAMEs appearing on the RHS of the PHI so that we don't
> test for conflicts multiple times if an SSA_NAME shows up in multiple PHI
> alternatives would help keep the cost down as well.
> 
> Ultimately I don't think we need to integrate uncprop and out-of-ssa to
> avoid the unprofitable transformation during uncprop.

Also see "Boissinot et al., Fast Liveness Checking for SSA-Form Programs"
(CGO 08).  They describe a way to do fast liveness queries without actually
doing a (memory) expensive data-flow analysis but using SSA immediate-uses
and dominance checks.  Sth we could use in SSA coalescing as well to avoid
both the liveness bitmaps and the conflict graph.

Comment 31 Richard Biener 2015-03-31 11:26:38 UTC

I'd say we push this back to GCC 6.

Comment 32 Jakub Jelinek 2015-03-31 12:38:01 UTC

Agreed, though ideally it should be fixed early in stage1.

Comment 33 Jeffrey A. Law 2015-03-31 16:16:16 UTC

On 03/31/2015 05:25 AM, rguenth at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164
>
> --- Comment #30 from Richard Biener <rguenth at gcc dot gnu.org> ---
> (In reply to Jeffrey A. Law from comment #28)
>> So I've been thinking about how to integrate life/conflict analysis into the
>> uncprop code and it may not be that bad, both from an implementation and
>> computation standpoint.
>>
>> Most importantly, we don't have to compute full life information.  We really
>> just need to compute the life of the equivalence.  Given the life of the
>> equivalence, if the equivalence is live in any block that contains the
>> defining statement for an SSA_NAME appearing in the target PHI, then the
>> equivalence conflicts and we don't want to unpropagate it.
>>
>> Computing the life of the equivalence is pretty easy and should be
>> reasonably quick.  This is a cost we'd have to pay regardless of whether or
>> not we integrate uncprop with out-of-ssa since we won't have life
>> information for the expression.
>>
>> Collecting the SSA_NAMEs appearing on the RHS of the PHI so that we don't
>> test for conflicts multiple times if an SSA_NAME shows up in multiple PHI
>> alternatives would help keep the cost down as well.
>>
>> Ultimately I don't think we need to integrate uncprop and out-of-ssa to
>> avoid the unprofitable transformation during uncprop.
>
> Also see "Boissinot et al., Fast Liveness Checking for SSA-Form Programs"
> (CGO 08).  They describe a way to do fast liveness queries without actually
> doing a (memory) expensive data-flow analysis but using SSA immediate-uses
> and dominance checks.  Sth we could use in SSA coalescing as well to avoid
> both the liveness bitmaps and the conflict graph.
Yea, it looks reasonably interesting and there's probably benefit in 
experimenting with that approach.  However, be aware that it's memory 
consumption can be problematical.   According to their summary, it's 
quadratic.  Though presumably we could drop back to the tried and true 
approach if we have too many BBs.

That definitely is stage1 material.

Jeff

Comment 34 Richard Biener 2015-04-01 07:51:10 UTC

(In reply to Jeffrey A. Law from comment #33)
> On 03/31/2015 05:25 AM, rguenth at gcc dot gnu.org wrote:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164
> >
> > --- Comment #30 from Richard Biener <rguenth at gcc dot gnu.org> ---
> > (In reply to Jeffrey A. Law from comment #28)
> >> So I've been thinking about how to integrate life/conflict analysis into the
> >> uncprop code and it may not be that bad, both from an implementation and
> >> computation standpoint.
> >>
> >> Most importantly, we don't have to compute full life information.  We really
> >> just need to compute the life of the equivalence.  Given the life of the
> >> equivalence, if the equivalence is live in any block that contains the
> >> defining statement for an SSA_NAME appearing in the target PHI, then the
> >> equivalence conflicts and we don't want to unpropagate it.
> >>
> >> Computing the life of the equivalence is pretty easy and should be
> >> reasonably quick.  This is a cost we'd have to pay regardless of whether or
> >> not we integrate uncprop with out-of-ssa since we won't have life
> >> information for the expression.
> >>
> >> Collecting the SSA_NAMEs appearing on the RHS of the PHI so that we don't
> >> test for conflicts multiple times if an SSA_NAME shows up in multiple PHI
> >> alternatives would help keep the cost down as well.
> >>
> >> Ultimately I don't think we need to integrate uncprop and out-of-ssa to
> >> avoid the unprofitable transformation during uncprop.
> >
> > Also see "Boissinot et al., Fast Liveness Checking for SSA-Form Programs"
> > (CGO 08).  They describe a way to do fast liveness queries without actually
> > doing a (memory) expensive data-flow analysis but using SSA immediate-uses
> > and dominance checks.  Sth we could use in SSA coalescing as well to avoid
> > both the liveness bitmaps and the conflict graph.
> Yea, it looks reasonably interesting and there's probably benefit in 
> experimenting with that approach.  However, be aware that it's memory 
> consumption can be problematical.   According to their summary, it's 
> quadratic.

Yes, but that's only if you store the liveness info.  We don't need to do that
but we only need to compute whether two partitions conflict for each coalescing
candidate (which means a few SSA conflict checks dependent on partition size).

Our current algorithm is already quadratic in memory use because we do
store SSA liveness and the partition conflict graph.

What I'm not sure is whether doing the SSA based liveness check is going to
be slower compile-time wise.

>  Though presumably we could drop back to the tried and true 
> approach if we have too many BBs.
> 
> That definitely is stage1 material.

Indeed.

> Jeff

Comment 35 Alexandre Oliva 2015-06-09 05:06:06 UTC

Author: aoliva
Date: Tue Jun  9 05:05:34 2015
New Revision: 224262

URL: https://gcc.gnu.org/viewcvs?rev=224262&root=gcc&view=rev
Log:
[PR64164] Drop copyrename, use coalescible partition as base when optimizing.

for  gcc/ChangeLog

	PR rtl-optimization/64164
	* Makefile.in (OBJS): Drop tree-ssa-copyrename.o.
	* tree-ssa-copyrename.c: Removed.
	* opts.c (default_options_table): Drop -ftree-copyrename.  Add
	-ftree-coalesce-vars.
	* passes.def: Drop all occurrences of pass_rename_ssa_copies.
	* common.opt (ftree-copyrename): Ignore.
	(ftree-coalesce-inlined-vars): Likewise.
	* doc/invoke.texi: Remove the ignored options above.
	* gimple-expr.h (gimple_can_coalesce_p): Move declaration
	* tree-ssa-coalesce.h: ... here.
	* tree-ssa-uncprop.c: Include tree-ssa-coalesce.h and other
	headers required by it.
	* gimple-expr.c (gimple_can_coalesce_p): Allow coalescing
	across variables when flag_tree_coalesce_vars.  Check register
	use and promoted modes to allow coalescing.  Moved to
	tree-ssa-coalesce.c.
	* tree-ssa-live.c (struct tree_int_map_hasher): Move along
	with its member functions to tree-ssa-coalesce.c.
	(var_map_base_init): Likewise.  Renamed to
	compute_samebase_partition_bases.
	(partition_view_normal): Drop want_bases parameter.
	(partition_view_bitmap): Likewise.
	* tree-ssa-live.h: Adjust declarations.
	* tree-ssa-coalesce.c: Include explow.h.
	(build_ssa_conflict_graph): Process PARM_ and RESULT_DECLs's
	default defs at the entry point.
	(dump_part_var_map): New.
	(compute_optimized_partition_bases): New, called by...
	(coalesce_ssa_name): ... when flag_tree_coalesce_vars, instead
	of compute_samebase_partition_bases.  Adjust.
	* alias.c (nonoverlapping_memrefs_p): Disregard gimple-regs.
	* cfgexpand.c (leader_merge): New.
	(get_rtl_for_parm_ssa_default_def): New.
	(set_rtl): Merge exprs and attrs, even for MEMs and non-SSA
	vars.  Update DECL_RTL for PARM_DECLs and RESULT_DECLs too.
	(expand_one_stack_var_at): Handle anonymous SSA_NAMEs.  Drop
	redundant MEM attr setting.
	(expand_one_stack_var_1): Handle anonymous SSA_NAMEs.  Renamed
	from...
	(expand_one_stack_var): ... this.  New wrapper to check and
	skip already expanded SSA partitions.
	(record_alignment_for_reg_var): New, factored out of...
	(expand_one_var): ... this.
	(expand_one_ssa_partition): New.
	(adjust_one_expanded_partition_var): New.
	(expand_one_register_var): Check and skip already expanded SSA
	partitions.
	(expand_used_vars): Don't create DECLs for anonymous SSA
	names.  Expand all SSA partitions, then adjust all SSA names.
	(pass::execute): Replace the loops that set
	SA.partition_to_pseudo from partition leaders and cleared
	DECL_RTL for multi-location variables, and that which used to
	rename vars and set attrs, with one that clears DECL_RTL and
	checks that PARMs and RESULTs default_defs match DECL_RTL.
	* cfgexpand.h (get_rtl_for_parm_ssa_default_def): Declare.
	* emit-rtl.c (set_reg_attrs_for_parm): Handle NULL decl.
	* explow.c (promote_ssa_mode): New.
	* explow.h (promote_ssa_mode): Declare.
	* expr.c (expand_expr_real_1): Handle anonymous SSA_NAMEs.
	* function.c: Include cfgexpand.h.
	(use_register_for_decl): Handle SSA_NAMEs, anonymous or not.
	(use_register_for_parm_decl): Wrapper for the above to
	special-case the result_ptr.
	(rtl_for_parm): Ditto for get_rtl_for_parm_ssa_default_def.
	(maybe_reset_rtl_for_parm): Reset DECL_RTL of parms with
	multiple locations.
	(assign_parm_adjust_stack_rtl): Add all and parm arguments,
	for rtl_for_parm.  For SSA-assigned parms, zero stack_parm.
	(assign_parm_setup_block): Prefer SSA-assigned location.
	(assign_parm_setup_reg): Likewise.  Use entry_parm for equiv
	if stack_parm is NULL.
	(assign_parm_setup_stack): Prefer SSA-assigned location.
	(assign_parms): Maybe reset DECL_RTL of params.  Adjust stack
	rtl before testing for pointer bounds.  Special-case result_ptr.
	(expand_function_start): Maybe reset DECL_RTL of result.
	Prefer SSA-assigned location for result and static chain.
	Factor out DECL_RESULT and SET_DECL_RTL.
	* tree-outof-ssa.c (insert_value_copy_on_edge): Handle
	anonymous SSA names.  Use promote_ssa_mode.
	(get_temp_reg): Likewise.
	(remove_ssa_form): Adjust.
	* var-tracking.c (dataflow_set_clear_at_call): Take call_insn
	and get its reg_usage for reg invalidation.
	(compute_bb_dataflow): Pass it insn.
	(emit_notes_in_bb): Likewise.
	* tree-ssa-loop-niter.c (loop_exits_before_overflow): Don't
	fail assert on conversion between unsigned types.

for  gcc/testsuite/ChangeLog

	* gcc.dg/guality/pr54200.c: Add -fno-tree-coalesce-vars.
	* gcc.dg/ssp-1.c: Make counter a register.
	* gcc.dg/ssp-2.c: Likewise.
	* gcc.dg/torture/parm-coalesce.c: New.

Added:
    trunk/gcc/testsuite/gcc.dg/torture/parm-coalesce.c
Removed:
    trunk/gcc/tree-ssa-copyrename.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/Makefile.in
    trunk/gcc/alias.c
    trunk/gcc/cfgexpand.c
    trunk/gcc/cfgexpand.h
    trunk/gcc/common.opt
    trunk/gcc/doc/invoke.texi
    trunk/gcc/emit-rtl.c
    trunk/gcc/explow.c
    trunk/gcc/explow.h
    trunk/gcc/expr.c
    trunk/gcc/function.c
    trunk/gcc/gimple-expr.c
    trunk/gcc/gimple-expr.h
    trunk/gcc/opts.c
    trunk/gcc/passes.def
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/guality/pr54200.c
    trunk/gcc/testsuite/gcc.dg/ssp-1.c
    trunk/gcc/testsuite/gcc.dg/ssp-2.c
    trunk/gcc/tree-outof-ssa.c
    trunk/gcc/tree-ssa-coalesce.c
    trunk/gcc/tree-ssa-coalesce.h
    trunk/gcc/tree-ssa-live.c
    trunk/gcc/tree-ssa-live.h
    trunk/gcc/tree-ssa-loop-niter.c
    trunk/gcc/tree-ssa-uncprop.c
    trunk/gcc/var-tracking.c

Comment 36 Alexandre Oliva 2015-06-09 05:34:26 UTC

I don't think we're backporting this huge change, and the much smaller one wasn't well-received, so...

Comment 37 Eric Botcazou 2015-06-09 09:10:38 UTC

The patch severely breaks on SPARC though:

int foo (char c)
{
  return (int) c;
}

eric@polaris:~/build/gcc/sparc-sun-solaris2.10> gcc/xgcc -Bgcc -S -O t.c
t.c: In function 'foo':
t.c:1:5: internal compiler error: in assign_parm_setup_reg, at function.c:3120
 int foo (char c)
     ^
0xa335d0 assign_parm_setup_reg
        /home/eric/svn/gcc/gcc/function.c:3120
0xa3660f assign_parms
        /home/eric/svn/gcc/gcc/function.c:3778
0xa3a9fd expand_function_start(tree_node*)
        /home/eric/svn/gcc/gcc/function.c:5215
0x838716 execute
        /home/eric/svn/gcc/gcc/cfgexpand.c:6127
Please submit a full bug report,

Comment 38 David Edelsohn 2015-06-09 16:35:07 UTC

It also broke ARM and PowerPC.  This patch should have been tested on all of the config-list.mk targets.

Comment 39 Alexandre Oliva 2015-06-10 00:15:09 UTC

At least the sparc regression was caused by the change richi requested to disregard the underlying decl in promote_ssa_mode.  I didn't realize this could cause a mismatch between the mode of the partition created for the QImode parm default def and the promoted mode for the parm decl expected by the parm-assignment code in function.c.  This will likely take some time to sort out, so I'm reverting the patch for now.

Comment 40 Christophe Lyon 2015-06-10 13:15:58 UTC

Created attachment 35740 [details]
ARM testcase

This is the testcase that breaks on ARM, when compiled with optimizations: -O0 is OK, -O1, -O2, -O3 crash with:
/media/lyon/9be1a707-5b7f-46da-9106-e084a5dbb011/ssd/src/GCC/sources/gcc-fsf/trunk/libgcc/fixed-bit.c: In function '__gnu_addqq3':
/media/lyon/9be1a707-5b7f-46da-9106-e084a5dbb011/ssd/src/GCC/sources/gcc-fsf/trunk/libgcc/fixed-bit.c:59:1: internal compiler error: RTL flag check: MEM_VOLATILE_P used with unexpected rtx code 'reg' in set_mem_attributes_minus_bitpos, at emit-rtl.c:1787
 FIXED_ADD (FIXED_C_TYPE a, FIXED_C_TYPE b)
 ^
0xa6eb52 rtl_check_failed_flag(char const*, rtx_def const*, char const*, int, char const*)
	/media/lyon/9be1a707-5b7f-46da-9106-e084a5dbb011/ssd/src/GCC/sources/gcc-fsf/trunk/gcc/rtl.c:800
0x771fc7 set_mem_attributes_minus_bitpos(rtx_def*, tree_node*, int, long)
	/media/lyon/9be1a707-5b7f-46da-9106-e084a5dbb011/ssd/src/GCC/sources/gcc-fsf/trunk/gcc/emit-rtl.c:1787
0x805294 assign_parm_setup_block
	/media/lyon/9be1a707-5b7f-46da-9106-e084a5dbb011/ssd/src/GCC/sources/gcc-fsf/trunk/gcc/function.c:2977
0x80b65c assign_parms
	/media/lyon/9be1a707-5b7f-46da-9106-e084a5dbb011/ssd/src/GCC/sources/gcc-fsf/trunk/gcc/function.c:3775
0x80e087 expand_function_start(tree_node*)
	/media/lyon/9be1a707-5b7f-46da-9106-e084a5dbb011/ssd/src/GCC/sources/gcc-fsf/trunk/gcc/function.c:5215
0x6a77ed execute
	/media/lyon/9be1a707-5b7f-46da-9106-e084a5dbb011/ssd/src/GCC/sources/gcc-fsf/trunk/gcc/cfgexpand.c:6127
Please submit a full bug report,

Comment 41 David Edelsohn 2015-06-10 14:43:16 UTC

Created attachment 35742 [details]
AIX PowerPC testcase

Comment 42 David Edelsohn 2015-06-10 14:45:30 UTC

Created attachment 35743 [details]
PPC64LE Linux Testcase

Comment 43 Alexandre Oliva 2015-07-23 15:35:21 UTC

Author: aoliva
Date: Thu Jul 23 15:34:49 2015
New Revision: 226113

URL: https://gcc.gnu.org/viewcvs?rev=226113&root=gcc&view=rev
Log:
[PR64164] Drop copyrename, use coalescible partition as base when optimizing.

for  gcc/ChangeLog

	PR rtl-optimization/64164
	* Makefile.in (OBJS): Drop tree-ssa-copyrename.o.
	* tree-ssa-copyrename.c: Removed.
	* opts.c (default_options_table): Drop -ftree-copyrename.  Add
	-ftree-coalesce-vars.
	* passes.def: Drop all occurrences of pass_rename_ssa_copies.
	* common.opt (ftree-copyrename): Ignore.
	(ftree-coalesce-inlined-vars): Likewise.
	* doc/invoke.texi: Remove the ignored options above.
	* gimple-expr.h (gimple_can_coalesce_p): Move declaration
	* tree-ssa-coalesce.h: ... here.
	* tree-ssa-uncprop.c: Include tree-ssa-coalesce.h and other
	headers required by it.
	* gimple-expr.c (gimple_can_coalesce_p): Allow coalescing
	across variables when flag_tree_coalesce_vars.  Check register
	use and promoted modes to allow coalescing.  Moved to
	tree-ssa-coalesce.c.
	* tree-ssa-live.c (struct tree_int_map_hasher): Move along
	with its member functions to tree-ssa-coalesce.c.
	(var_map_base_init): Likewise.  Renamed to
	compute_samebase_partition_bases.
	(partition_view_normal): Drop want_bases parameter.
	(partition_view_bitmap): Likewise.
	* tree-ssa-live.h: Adjust declarations.
	* tree-ssa-coalesce.c: Include explow.h.
	(build_ssa_conflict_graph): Process PARM_ and RESULT_DECLs's
	default defs at the entry point.
	(dump_part_var_map): New.
	(compute_optimized_partition_bases): New, called by...
	(coalesce_ssa_name): ... when flag_tree_coalesce_vars, instead
	of compute_samebase_partition_bases.  Adjust.
	* alias.c (nonoverlapping_memrefs_p): Disregard gimple-regs.
	* cfgexpand.c (leader_merge): New.
	(get_rtl_for_parm_ssa_default_def): New.
	(set_rtl): Merge exprs and attrs, even for MEMs and non-SSA
	vars.  Update DECL_RTL for PARM_DECLs and RESULT_DECLs too.
	(expand_one_stack_var_at): Handle anonymous SSA_NAMEs.  Drop
	redundant MEM attr setting.
	(expand_one_stack_var_1): Handle anonymous SSA_NAMEs.  Renamed
	from...
	(expand_one_stack_var): ... this.  New wrapper to check and
	skip already expanded SSA partitions.
	(record_alignment_for_reg_var): New, factored out of...
	(expand_one_var): ... this.
	(expand_one_ssa_partition): New.
	(adjust_one_expanded_partition_var): New.
	(expand_one_register_var): Check and skip already expanded SSA
	partitions.
	(expand_used_vars): Don't create DECLs for anonymous SSA
	names.  Expand all SSA partitions, then adjust all SSA names.
	(pass::execute): Replace the loops that set
	SA.partition_to_pseudo from partition leaders and cleared
	DECL_RTL for multi-location variables, and that which used to
	rename vars and set attrs, with one that clears DECL_RTL and
	checks that PARMs and RESULTs default_defs match DECL_RTL.
	* cfgexpand.h (get_rtl_for_parm_ssa_default_def): Declare.
	* emit-rtl.c (set_reg_attrs_for_parm): Handle NULL decl.
	* explow.c (promote_ssa_mode): New.
	* explow.h (promote_ssa_mode): Declare.
	* expr.c (expand_expr_real_1): Handle anonymous SSA_NAMEs.
	* function.c: Include cfgexpand.h.
	(use_register_for_decl): Handle SSA_NAMEs, anonymous or not.
	(use_register_for_parm_decl): Wrapper for the above to
	special-case the result_ptr.
	(rtl_for_parm): Ditto for get_rtl_for_parm_ssa_default_def.
	(split_complex_args): Take assign_parm_data_all argument.
	Pass it to rtl_for_parm.  Set up rtl and context for split
	args.
	(assign_parms_augmented_arg_list): Adjust.
	(maybe_reset_rtl_for_parm): Reset DECL_RTL of parms with
	multiple locations.  Recognize split complex args.
	(assign_parm_adjust_stack_rtl): Add all and parm arguments,
	for rtl_for_parm.  For SSA-assigned parms, zero stack_parm.
	(assign_parm_setup_block): Prefer SSA-assigned location.
	(assign_parm_setup_reg): Likewise.  Use entry_parm for equiv
	if stack_parm is NULL.
	(assign_parm_setup_stack): Prefer SSA-assigned location.
	(assign_parms): Maybe reset DECL_RTL of params.  Adjust stack
	rtl before testing for pointer bounds.  Special-case result_ptr.
	(expand_function_start): Maybe reset DECL_RTL of result.
	Prefer SSA-assigned location for result and static chain.
	Factor out DECL_RESULT and SET_DECL_RTL.
	* tree-outof-ssa.c (insert_value_copy_on_edge): Handle
	anonymous SSA names.  Use promote_ssa_mode.
	(get_temp_reg): Likewise.
	(remove_ssa_form): Adjust.
	* stor-layout.c (layout_decl): Don't set mem attributes of
	non-MEMs.
	* var-tracking.c (dataflow_set_clear_at_call): Take call_insn
	and get its reg_usage for reg invalidation.
	(compute_bb_dataflow): Pass it insn.
	(emit_notes_in_bb): Likewise.

for  gcc/testsuite/ChangeLog

	* gcc.dg/guality/pr54200.c: Add -fno-tree-coalesce-vars.
	* gcc.dg/ssp-1.c: Make counter a register.
	* gcc.dg/ssp-2.c: Likewise.
	* gcc.dg/torture/parm-coalesce.c: New.

Added:
    trunk/gcc/testsuite/gcc.dg/torture/parm-coalesce.c
Removed:
    trunk/gcc/tree-ssa-copyrename.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/Makefile.in
    trunk/gcc/alias.c
    trunk/gcc/cfgexpand.c
    trunk/gcc/cfgexpand.h
    trunk/gcc/common.opt
    trunk/gcc/doc/invoke.texi
    trunk/gcc/emit-rtl.c
    trunk/gcc/explow.c
    trunk/gcc/explow.h
    trunk/gcc/expr.c
    trunk/gcc/function.c
    trunk/gcc/gimple-expr.c
    trunk/gcc/gimple-expr.h
    trunk/gcc/opts.c
    trunk/gcc/passes.def
    trunk/gcc/stor-layout.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/guality/pr54200.c
    trunk/gcc/testsuite/gcc.dg/ssp-1.c
    trunk/gcc/testsuite/gcc.dg/ssp-2.c
    trunk/gcc/tree-outof-ssa.c
    trunk/gcc/tree-ssa-coalesce.c
    trunk/gcc/tree-ssa-coalesce.h
    trunk/gcc/tree-ssa-live.c
    trunk/gcc/tree-ssa-live.h
    trunk/gcc/tree-ssa-uncprop.c
    trunk/gcc/var-tracking.c

Comment 44 Andreas Schwab 2015-07-24 10:37:37 UTC

This breaks gcc.dg/pr43300.c on aarch64.

$ gcc/xgcc -B gcc/ ../gcc/testsuite/gcc.dg/pr43300.c
../gcc/testsuite/gcc.dg/pr43300.c: In function ‘foo’:
../gcc/testsuite/gcc.dg/pr43300.c:8:1: internal compiler error: in emit_move_insn, at expr.c:3552
 foo (int x, V2SF a)
 ^
0x7ee783 emit_move_insn(rtx_def*, rtx_def*)
        ../../gcc/expr.c:3551
0x84a80b assign_parm_setup_reg
        ../../gcc/function.c:3322
0x84c4ff assign_parms
        ../../gcc/function.c:3766
0x84f353 expand_function_start(tree_node*)
        ../../gcc/function.c:5192
0x6f919f execute
        ../../gcc/cfgexpand.c:6105

Comment 45 Andreas Schwab 2015-07-24 10:51:12 UTC

It also breaks a lot of tests on m68k, eg:

$ gcc/xgcc -B gcc/ ../gcc/testsuite/gcc.dg/pr17957.c 
../gcc/testsuite/gcc.dg/pr17957.c: In function ‘vadd’:
../gcc/testsuite/gcc.dg/pr17957.c:6:1: internal compiler error: in expand_one_stack_var_1, at cfgexpand.c:1221
 vadd (void)
 ^
0x662055 expand_one_stack_var_1
        ../../gcc/cfgexpand.c:1221
0x670685 expand_one_ssa_partition
        ../../gcc/cfgexpand.c:1295
0x670685 expand_used_vars
        ../../gcc/cfgexpand.c:1940
0x671e20 execute
        ../../gcc/cfgexpand.c:6084

Comment 46 Steve Ellcey 2015-07-24 16:57:22 UTC

I see the same failures on MIPS too.

Comment 47 Rainer Orth 2015-07-27 09:07:31 UTC

(In reply to Andreas Schwab from comment #44)

Same on sparc.

Comment 48 Alexandre Oliva 2015-07-30 18:20:14 UTC

The errors reported in comments 44, 45, 46, and 47 are fixed in the git branch aoliva/pr64164.  I'm giving it all some more testing before posting an updated, consolidated patch.

Comment 49 Gary Funck 2015-08-02 21:11:33 UTC

(In reply to Alexandre Oliva from comment #48)
> The errors reported in comments 44, 45, 46, and 47 are fixed in the git
> branch aoliva/pr64164.  I'm giving it all some more testing before posting
> an updated, consolidated patch.

I applied your patch (commit 9357ff1, 8/2/15) to our GUPC branch, based off trunk version 226386 on gcc112 (PPC64LE).  It bootstrapped fine and passed the tests with "-O0 --enable-checking" and "-O3 --disable-checking".

Comment 50 Alexandre Oliva 2015-08-14 18:52:22 UTC

Author: aoliva
Date: Fri Aug 14 18:51:50 2015
New Revision: 226901

URL: https://gcc.gnu.org/viewcvs?rev=226901&root=gcc&view=rev
Log:
[PR64164] Drop copyrename, use coalescible partition as base when optimizing.

for  gcc/ChangeLog

	PR rtl-optimization/64164
	PR bootstrap/66978
	PR middle-end/66983
	PR rtl-optimization/67000
	PR middle-end/67034
	PR middle-end/67035
	* Makefile.in (OBJS): Drop tree-ssa-copyrename.o.
	* tree-ssa-copyrename.c: Removed.
	* opts.c (default_options_table): Drop -ftree-copyrename.  Add
	-ftree-coalesce-vars.
	* passes.def: Drop all occurrences of pass_rename_ssa_copies.
	* common.opt (ftree-copyrename): Ignore.
	(ftree-coalesce-inlined-vars): Likewise.
	* doc/invoke.texi: Remove the ignored options above.
	* gimple-expr.h (gimple_can_coalesce_p): Move declaration
	* tree-ssa-coalesce.h: ... here.
	* tree-ssa-uncprop.c: Include tree-ssa-coalesce.h and other
	headers required by it.
	* gimple-expr.c (gimple_can_coalesce_p): Allow coalescing
	across variables when flag_tree_coalesce_vars.  Check register
	use and promoted modes to allow coalescing.  Do not coalesce
	maybe-byref parms with SSA_NAMEs of other variables, or
	anonymous SSA_NAMEs.  Moved to tree-ssa-coalesce.c.
	* tree-ssa-live.c (struct tree_int_map_hasher): Move along
	with its member functions to tree-ssa-coalesce.c.
	(var_map_base_init): Likewise.  Renamed to
	compute_samebase_partition_bases.
	(partition_view_normal): Drop want_bases parameter.
	(partition_view_bitmap): Likewise.
	* tree-ssa-live.h: Adjust declarations.
	* tree-ssa-coalesce.c: Include explow.h and cfgexpand.h.
	(build_ssa_conflict_graph): Process PARM_ and RESULT_DECLs's
	default defs at the entry point.
	(dump_part_var_map): New.
	(compute_optimized_partition_bases): New, called by...
	(coalesce_ssa_name): ... when flag_tree_coalesce_vars, instead
	of compute_samebase_partition_bases.  Adjust.
	* alias.c (nonoverlapping_memrefs_p): Disregard gimple-regs.
	* cfgexpand.c (leader_merge, parm_maybe_byref_p): New.
	(ssa_default_def_partition): New.
	(get_rtl_for_parm_ssa_default_def): New.
	(align_local_variable, add_stack_var): Support anonymous SSA
	names.
	(defer_stack_allocation): Likewise.  Declare earlier.
	(set_rtl): Merge exprs and attrs, even for MEMs and non-SSA
	vars.  Update DECL_RTL for PARM_DECLs and RESULT_DECLs too.
	Do no record deferred-allocation marker in
	SA.partition_to_pseudo.
	(expand_stack_vars): Adjust check for the marker in it.
	(expand_one_stack_var_at): Handle anonymous SSA_NAMEs.  Drop
	redundant MEM attr setting.
	(expand_one_stack_var_1): Handle anonymous SSA_NAMEs.  Renamed
	from...
	(expand_one_stack_var): ... this.  New wrapper to check and
	skip already expanded SSA partitions.
	(record_alignment_for_reg_var): New, factored out of...
	(expand_one_var): ... this.
	(expand_one_ssa_partition): New.
	(adjust_one_expanded_partition_var): New.
	(expand_one_register_var): Check and skip already expanded SSA
	partitions.
	(expand_used_vars): Don't create DECLs for anonymous SSA
	names.  Expand all SSA partitions, then adjust all SSA names.
	(pass::execute): Replace the loops that set
	SA.partition_to_pseudo from partition leaders and cleared
	DECL_RTL for multi-location variables, and that which used to
	rename vars and set attrs, with one that clears DECL_RTL and
	checks that PARMs and RESULTs default_defs match DECL_RTL.
	* cfgexpand.h (get_rtl_for_parm_ssa_default_def): Declare.
	* emit-rtl.c: Include stor-layout.h.
	(set_reg_attrs_for_parm): Handle NULL decl.
	(set_reg_attrs_for_decl_rtl): Take mode from expression if
	it's not a DECL.
	* stmt.c (emit_case_decision_tree): Pass it the SSA_NAME
	rather than its possibly-NULL DECL.
	* explow.c (promote_ssa_mode): New.
	* explow.h (promote_ssa_mode): Declare.
	* expr.c (expand_expr_real_1): Handle anonymous SSA_NAMEs.
	(read_complex_part): Export.
	* expr.h (read_complex_part): Declare.
	* cfgexpand.h (parm_maybe_byref_p): Declare.
	* function.c: Include cfgexpand.h.
	(use_register_for_decl): Handle SSA_NAMEs, anonymous or not.
	(use_register_for_parm_decl): Wrapper for the above to
	special-case the result_ptr.
	(rtl_for_parm): Ditto for get_rtl_for_parm_ssa_default_def.
	(split_complex_args): Take assign_parm_data_all argument.
	Pass it to rtl_for_parm.  Set up rtl and context for split
	args.  Reset complex parm before fetching its default decl
	rtl.
	(assign_parms_unsplit_complex): Use the default-def complex
	parm rtl if it matches the components.
	(assign_parms_augmented_arg_list): Adjust.
	(maybe_reset_rtl_for_parm): Reset DECL_RTL of parms with
	multiple locations.  Recognize split complex args.
	(assign_parm_adjust_stack_rtl): Add all and parm arguments,
	for rtl_for_parm.  For SSA-assigned parms, zero stack_parm.
	(assign_parm_setup_block): Prefer SSA-assigned location, and
	fill in its address if the memory location of a maybe-byref
	parm was not assigned by cfgexpand.
	(assign_parm_setup_reg): Likewise.  Adjust its mode as
	needed.  Use entry_parm for equiv if stack_parm is NULL.  Make
	sure passed_pointer parms don't need conversion.  Copy address
	or value as needed.
	(assign_parm_setup_stack): Prefer SSA-assigned location.
	(assign_parms): Maybe reset DECL_RTL of params.  Adjust stack
	rtl before testing for pointer bounds.  Special-case result_ptr.
	(expand_function_start): Maybe reset DECL_RTL of result.
	Prefer SSA-assigned location for result and static chain.
	Factor out DECL_RESULT and SET_DECL_RTL.  Convert static chain
	to Pmode if needed, from H.J. Lu  <hongjiu.lu@intel.com>.
	* tree-outof-ssa.c (insert_value_copy_on_edge): Handle
	anonymous SSA names.  Use promote_ssa_mode.
	(get_temp_reg): Likewise.
	(remove_ssa_form): Adjust.
	* stor-layout.c (layout_decl): Don't set mem attributes of
	non-MEMs.
	* var-tracking.c (dataflow_set_clear_at_call): Take call_insn
	and get its reg_usage for reg invalidation.
	(compute_bb_dataflow): Pass it insn.
	(emit_notes_in_bb): Likewise.

for  gcc/testsuite/ChangeLog

	* gcc.dg/guality/pr54200.c: Add -fno-tree-coalesce-vars.
	* gcc.dg/ssp-1.c: Make counter a register.
	* gcc.dg/ssp-2.c: Likewise.
	* gcc.dg/torture/parm-coalesce.c: New.

Added:
    trunk/gcc/testsuite/gcc.dg/torture/parm-coalesce.c
Removed:
    trunk/gcc/tree-ssa-copyrename.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/Makefile.in
    trunk/gcc/alias.c
    trunk/gcc/cfgexpand.c
    trunk/gcc/cfgexpand.h
    trunk/gcc/common.opt
    trunk/gcc/doc/invoke.texi
    trunk/gcc/emit-rtl.c
    trunk/gcc/explow.c
    trunk/gcc/explow.h
    trunk/gcc/expr.c
    trunk/gcc/expr.h
    trunk/gcc/function.c
    trunk/gcc/gimple-expr.c
    trunk/gcc/gimple-expr.h
    trunk/gcc/opts.c
    trunk/gcc/passes.def
    trunk/gcc/stmt.c
    trunk/gcc/stor-layout.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/guality/pr54200.c
    trunk/gcc/testsuite/gcc.dg/ssp-1.c
    trunk/gcc/testsuite/gcc.dg/ssp-2.c
    trunk/gcc/tree-outof-ssa.c
    trunk/gcc/tree-ssa-coalesce.c
    trunk/gcc/tree-ssa-coalesce.h
    trunk/gcc/tree-ssa-live.c
    trunk/gcc/tree-ssa-live.h
    trunk/gcc/tree-ssa-uncprop.c
    trunk/gcc/var-tracking.c

Comment 51 Alexandre Oliva 2015-08-15 02:27:06 UTC

Fixed for the next major release.  Not planning a backport.

Comment 52 Alexandre Oliva 2015-08-19 17:01:04 UTC

Author: aoliva
Date: Wed Aug 19 17:00:32 2015
New Revision: 227015

URL: https://gcc.gnu.org/viewcvs?rev=227015&root=gcc&view=rev
Log:
[PR64164] fix regressions reported on m68k and armeb

Defer stack slot address assignment for all parms that can't live in
pseudos, and accept pseudos assignments in assign_param_setup_block.

for  gcc/ChangeLog

	PR rtl-optimization/64164
	* cfgexpand.c (parm_maybe_byref_p): Renamed to...
	(parm_in_stack_slot_p): ... this.  Disregard mode, what
	matters is whether the parm will live in a pseudo or a stack
	slot.
	(expand_one_ssa_partition): Deal with params without a default
	def.  Disregard mode.
	* cfgexpand.h: Renamed function declaration.
	* tree-ssa-coalesce.c: Adjust.
	* function.c (split_complex_args): Allocate stack slot for
	unassigned parms before splitting.
	(parm_in_unassigned_mem_p): New.  Use it instead of
	parm_maybe_byref_p throughout this file.
	(assign_parm_setup_block): Use it.  Accept pseudos in the
	expand-assigned rtl.
	(assign_parm_setup_reg): Drop BLKmode requirement.
	(assign_parm_setup_stack): Allocate and fill in the address of
	unassigned MEM parms.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cfgexpand.c
    trunk/gcc/cfgexpand.h
    trunk/gcc/function.c
    trunk/gcc/tree-ssa-coalesce.c

Comment 53 Alexandre Oliva 2015-08-21 20:03:49 UTC

Author: aoliva
Date: Fri Aug 21 20:03:14 2015
New Revision: 227085

URL: https://gcc.gnu.org/viewcvs?rev=227085&root=gcc&view=rev
Log:
fix sched compare regression

for  gcc/ChangeLog

	PR rtl-optimization/64164
	PR rtl-optimization/67227
	* alias.c (memrefs_conflict_p): Handle VALUEs in PLUS better.
	(nonoverlapping_memrefs_p): Test offsets and sizes when given
	identical gimple_reg exprs.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/alias.c

Comment 54 Alexandre Oliva 2015-09-23 21:13:43 UTC

New patch posted at https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01793.html

Comment 55 Alexandre Oliva 2015-09-27 09:02:33 UTC

Author: aoliva
Date: Sun Sep 27 09:02:00 2015
New Revision: 228175

URL: https://gcc.gnu.org/viewcvs?rev=228175&root=gcc&view=rev
Log:
revert to assign_parms assignments using default defs

Revert the fragile and complicated changes to assign_parms designed to
enable it to use RTL assigments chosen by cfgexpand, and instead have
cfgexpand use the RTL assignments by assign_parms, keying them off of
the default defs that are now necessarily introduced for each parm and
result.  The possible lack of a default def was already a problem, and
the fallbacks in place were not enough, as shown by PR67312.  We now
have checking asserts in set_rtl that verify that we're assigning to
each var a piece of RTL that matches the expectations set forth by
use_register_for_decl.

for  gcc/ChangeLog

	PR rtl-optimization/64164
	PR tree-optimization/67312
	PR middle-end/67340
	PR middle-end/67490
	PR bootstrap/67597
	* cfgexpand.c (parm_in_stack_slot_p): Remove.
	(ssa_default_def_partition): Remove.
	(get_rtl_for_parm_ssa_default_def): Remove.
	(set_rtl): Check that RTL assignments match expectations.
	Loop on SUBREGs, CONCATs and PARALLELs subexprs.  Set only the
	default def location for params and results.  Record SSA names
	or types in REG and MEM attrs, respectively.
	(set_parm_rtl): New.
	(expand_one_ssa_partition): Drop logic that assigned MEMs with
	unassigned addresses.
	(adjust_one_expanded_partition_var): Don't accept NULL RTL on
	deferred stack alloc vars.
	(expand_used_vars): Skip partitions holding parm default defs.
	Move adjust_one_expanded_partition_var loop...
	(pass_expand::execute): ... here.  Drop redundant assert.
	Adjust comments before the final loop over all ssa names.
	Require assigned rtl of parms and results to match exactly.
	Reset its attributes to match them, not any other variables in
	the same partition.
	(expand_debug_expr): Use entry value for PARM's default defs
	only iff they have zero nondebug uses.
	* cfgexpand.h (parm_in_stack_slot_p): Remove.
	(get_rtl_for_parm_ssa_default_def): Remove.
	(set_parm_rtl): Declare.
	* doc/invoke.texi: Improve wording.
	* explow.c (promote_decl_mode): Fix promote_function_mode for
	result decls not by reference.
	(promote_ssa_mode): Disregard BLKmode from promote_decl, and
	bypass TYPE_MODE to get the actual vector mode.
	* function.c: Include tree-dfa.h.  Revert 2015-08-14's and
	2015-08-19's changes as follows.  Drop include of
	basic-block.h and df.h.
	(rtl_for_parm): Remove.
	(maybe_reset_rtl_for_parm): Remove.
	(parm_in_unassigned_mem_p): Remove.
	(use_register_for_decl): Add logic for RESULT_DECLs matching
	assign_parms' behavior.
	(split_complex_args): Revert.
	(assign_parms_augmented_arg_list): Revert.  Add comment
	referencing the logic above.
	(assign_parm_adjust_stack_rtl): Revert.
	(assign_parm_setup_block): Revert.  Use set_parm_rtl instead
	of SET_DECL_RTL.  Set up a REG if the parm demands so.
	(assign_parm_setup_reg): Revert.  Consolidated SET_DECL_RTL
	calls into a single set_parm_rtl.  Set up a temporary RTL
	temporarily for expand_assignment.
	(assign_parm_setup_stack): Revert.  Use set_parm_rtl.
	(assign_parms_unsplit_complex): Revert.  Use set_parm_rtl.
	(assign_bounds): Revert.
	(assign_parms): Revert.  Use set_parm_rtl.
	(allocate_struct_function): Relayout result and parms of
	non-abstruct functions.
	(expand_function_start): Revert.  Use set_parm_rtl.  If the
	result is not a hard reg, create a pseudo from the promoted
	mode of the default def.  Promote static chain mode.
	* tree-outof-ssa.c (remove_ssa_form): Drop unused
	partition_has_default_def.  Set up
	partitions_for_parm_default_defs.
	(finish_out_of_ssa): Remove partition_has_default_def.
	Release partitions_for_parm_default_defs.
	* tree-outof-ssa.h (struct ssaexpand): Remove
	partition_has_default_def.  Add
	partitions_for_parm_default_defs.
	* tree-ssa-coalesce.c: Include tree-dfa.h, tm_p.h and
	stor-layout.h.
	(build_ssa_conflict_graph): Fix conflict-detection of default
	defs of even unused default defs of params and results.
	(for_all_parms): New.
	(create_default_def): New.
	(register_default_def): New.
	(coalesce_with_default): New.
	(create_outofssa_var_map): Create default defs for all parms
	and results, and register their partitions.  Add GIMPLE_RETURN
	operands as coalesce candidates with results.  Add default
	defs of each parm or result as coalesce candidates with its
	other defs.  Mark each result def, and each default def of
	parms, as used_in_copy.
	(gimple_can_coalesce_p): Call it.  Call use_register_for_decl
	with the ssa names, even anonymous ones.  Drop
	parm_in_stack_slot_p calls.  Require same signedness and
	alignment.
	(coalesce_ssa_name): Add coalesce candidates for all defs of
	each parm and result, even unused ones.
	(parm_default_def_partition_arg): New type.
	(set_parm_default_def_partition): New.
	(get_parm_default_def_partitions): New.
	* tree-ssa-coalesce.h (get_parm_default_def_partitions): New.
	* tree-ssa-live.c (partition_view_init): Regard unused defs of
	parms and results as used.
	(verify_live_on_entry): Don't error out just because they're
	not live.

for  gcc/testsuite/ChangeLog

	PR rtl-optimization/64164
	PR tree-optimization/67312
	* gcc.dg/pr67312.c: New.  From Zdenek Sojka.
	* gcc.target/i386/stackalign/return-4.c: Add -O.

Added:
    trunk/gcc/testsuite/gcc.dg/pr67312.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cfgexpand.c
    trunk/gcc/cfgexpand.h
    trunk/gcc/doc/invoke.texi
    trunk/gcc/explow.c
    trunk/gcc/function.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/i386/stackalign/return-4.c
    trunk/gcc/tree-outof-ssa.c
    trunk/gcc/tree-outof-ssa.h
    trunk/gcc/tree-ssa-coalesce.c
    trunk/gcc/tree-ssa-coalesce.h
    trunk/gcc/tree-ssa-live.c

Comment 56 Alexandre Oliva 2015-11-06 10:34:46 UTC

Author: aoliva
Date: Fri Nov  6 10:34:13 2015
New Revision: 229840

URL: https://gcc.gnu.org/viewcvs?rev=229840&root=gcc&view=rev
Log:
[PR67753] fix copy of PARALLEL entry_parm to CONCAT target_reg

In assign_parms_setup_block, the copy of args in PARALLELs from
entry_parm to stack_parm is deferred to the parm conversion insn seq,
but the copy from stack_parm to target_reg was inserted in the normal
copy seq, that is executed before the conversion insn seq.  Oops.

We could do away with the need for an actual stack_parm in general,
which would have avoided the need for emitting the copy to target_reg
in the conversion seq, but at least on pa, due to the need for stack
to copy between SI and SF modes, it seems like using the reserved
stack slot is beneficial, so I put in logic to use a pre-reserved
stack slot when there is one, and emit the copy to target_reg in the
conversion seq if stack_parm was set up there.

for  gcc/ChangeLog

	PR rtl-optimization/67753
	PR rtl-optimization/64164
	* function.c (assign_parm_setup_block): Avoid allocating a
	stack slot if we don't have an ABI-reserved one.  Emit the
	copy to target_reg in the conversion seq if the copy from
	entry_parm is in it too.  Don't use the conversion seq to copy
	a PARALLEL to a REG or a CONCAT.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/function.c

Comment 57 Alexandre Oliva 2015-11-26 21:58:13 UTC

Author: aoliva
Date: Thu Nov 26 21:57:40 2015
New Revision: 230985

URL: https://gcc.gnu.org/viewcvs?rev=230985&root=gcc&view=rev
Log:
[PR67753] adjust for padding when bypassing memory in assign_parm_setup_block

Storing a register in memory as a full word and then accessing the
same memory address under a smaller-than-word mode amounts to
right-shifting of the register word on big endian machines.  So, if
BLOCK_REG_PADDING chooses upward padding for BYTES_BIG_ENDIAN, and
we're copying from the entry_parm REG directly to a pseudo, bypassing
any stack slot, perform the shifting explicitly.

This fixes the miscompile of function_return_val_10 in
gcc.target/aarch64/aapcs64/func-ret-4.c for target aarch64_be-elf
introduced in the first patch for 67753.

for  gcc/ChangeLog

	PR rtl-optimization/67753
	PR rtl-optimization/64164
	* function.c (assign_parm_setup_block): Right-shift
	upward-padded big-endian args when bypassing the stack slot.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/function.c

Comment 58 Dominik Vogt 2015-12-04 12:39:08 UTC

The patch in comment 35 causes a performance regression on s390.  See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68695

Comment 59 Andre Vieira 2016-03-10 11:03:04 UTC

I believe PR70164 is related to this.