On Tue, 6 Dec 2016, Richard Biener wrote:
On Mon, 5 Dec 2016, Jeff Law wrote:
On 12/02/2016 01:33 AM, Richard Biener wrote:
The LHS on the assignment makes it easier to identify when a tail call is
possible. It's not needed for correctness. Not having the LHS on the
assignment just means we won't get an optimized tail call.
Under what circumstances would the LHS possibly be removed? We know the
return statement references the LHS, so it's not going to be something
that
DCE will do.
Well, I thought Prathamesh added the optimization to copy-propagate
the lhs from the returned argument. So we'd have both transforms here.
That seems like a mistake -- the fact that we can copy propagate the LHS from
the returned argument is interesting, but in practice I've found it to not be
useful to do so.
The problem is it makes the value look live across a the call and we're then
dependent upon the register allocator to know the trick about the returned
argument value and apply it consistently -- which it does not last I checked.
I think we're better off leaving the call in the form of LHS = call () if the
return value is used. That's going to be more palatable to tail calling.
Yes, that's something I also raised earlier in the thread. Note that
any kind of value-numbering probably wants to know the equivalence
for simplifications but in the end wants to disable propagating the
copy (in fact it should propagate the return value from the point of
the call). I suppose I know how to implement that in FRE/PRE given it has
separate value-numbering and elimination phases. Something for GCC 8.
The following does that (it shows we don't handle separating LHS
and overall stmt effect very well). It optimizes a testcase like
void *foo (void *p, int c, __SIZE_TYPE__ n)
{
void *q = __builtin_memset (p, c, n);
if (q == p)
return p;
return q;
}
to
foo (void * p, int c, long unsigned int n)
{
void * q;
<bb 2> [0.0%]:
q_7 = __builtin_memset (p_3(D), c_4(D), n_5(D));
return q_7;
in early FRE.