[PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls

Tom de Vries Tom_deVries@mentor.com
Fri Apr 27 06:20:00 GMT 2012


On 26/04/12 12:20, Richard Guenther wrote:
> On Wed, Apr 25, 2012 at 11:56 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>> On 25/04/12 11:57, Richard Guenther wrote:
>> <SNIP>
>>>>>>> Hmm.  I'm not sure we can conclude that they have the same value!
>>>>>>>
>>>>>>> +int bar (int);
>>>>>>> +void baz (int);
>>>>>>> +void bla (int);
>>>>>>> +
>>>>>>> +void
>>>>>>> +foo (int y)
>>>>>>> +{
>>>>>>> +  int a;
>>>>>>> +  if (y == 6)
>>>>>>> +    {
>>>>>>> +      bla (5);
>>>>>>> +      a = bar (7);
>>>>>>> +    }
>>>>>>> +  else
>>>>>>> +    {
>>>>>>> +      bla (5);
>>>>>>> +      a = bar (7);
>>>>>>> +    }
>>>>>>> +  baz (a);
>>>>>>> +}
>>>>>>>
>>>>>>> I can implement bla as
>>>>>>>
>>>>>>> void bla(int) { a = random (); }
>>>>>>>
>>>>>>> thus a would not have the same value on both paths
>>>>>
>>>>> I think it's hard to decide whether they have the same value or not, since we
>>>>> cannot write a test-case that compares the results of calls from 2 branches of
>>>>> an if statement.
>>> void *foo ()
>>> {
>>>   return __builtin_return_address (0);
>>> }
>>>
>>> void *bar (_Bool b)
>>> {
>>>   if (b)
>>>     return foo ();
>>>   else
>>>     return foo ();
>>> }
>>>
>>> int main()
>>> {
>>>   if (bar(true) == bar(false))
>>>     abort ();
>>> }
>>>
>>> ok ... outside of the scope of standard "C", but we certainly _can_ do this.
>>> Which would question tail-merging the above at all, of course.
>>>
>>
>> I added noinline to make sure there's no variation from inlining.
>> ...
>> extern void abort (void) __attribute__ ((__nothrow__)) __attribute__
>> ((__noreturn__));
>>
>> __attribute__((noinline))
>> void *foo ()
>> {
>>  return __builtin_return_address (0);
>> }
>>
>> __attribute__((noinline))
>> void *bar (int b)
>> {
>>  if (b)
>>    return foo ();
>>  else
>>    return foo ();
>> }
>>
>> int main()
>> {
>>  void *a, *b;
>>  a = bar (1);
>>  b = bar (0);
>>  if (a == b)
>>    abort ();
>>  return 0;
>> }
>> ...
>>
>> This test-case passes with:
>> ...
>> gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge -fno-crossjumping
>> ...
>>
>> and fails with:
>> ...
>> gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge -fcrossjumping
>> ...
>>
>> with the proposed patch, it also fails with:
>> ...
>> gcc -O2 calls.c -fno-optimize-sibling-calls -ftree-tail-merge -fno-crossjumping
>> ...
>>
>> Tree-tail-merge performs the same transformation as cross-jumping for this
>> test-case. I think that the transformation is valid, and that the test-case
>> behaves as expected. Subsequent calls to foo may or may not return the same
>> value, depending on the optimizations, we can't assert a != b, that's just the
>> nature of __builtin_return_address.
>>
>> Btw, this example is not what I meant when I said 'a test-case that compares the
>> results of calls from 2 branches of an if statement'. I tried to say that the
>> semantics of C is defined in terms of taking either 1 branch or the other,
>> rather than executing both in parallel, after which both results would be
>> available. From that point of view, this example compares subsequent calls to
>> foo, not parallel.
> 
> Ah, I see.  Btw, I was not suggesting that we should not optimize the above.
> 
>>>>> I started looking at the problem in terms of subsequent calls. Consider the
>>>>> imaginary attribute nosideeffect, similar to const and pure.
>>>>>
>>>>> const:
>>>>> - no effects except the return value
>>>>> - the return value depends only on the parameters
>>>>>
>>>>> pure:
>>>>> - no effects except the return value
>>>>> - the return value depends only on the parameters and/or global variables
>>>>>
>>>>> nosideeffect:
>>>>> - no effects except the return value
>>>>>
>>>>> The code fragment:
>>>>> ...
>>>>> extern int f (int) __attribute ((nosideeffect));
>>>>> int g()
>>>>> {
>>>>>  int a;
>>>>>  int b;
>>>>>  a = f (5);
>>>>>  b = f (5);
>>>>>  return a + b;
>>>>> }
>>>>> ...
>>>>>
>>>>> would result in a gimple sequence more or less like this:
>>>>> ...
>>>>>  # VUSE <.MEMD.1712_4(D)>
>>>>>  aD.1707_1 = fD.1704 (5);
>>>>>  # VUSE <.MEMD.1712_4(D)>
>>>>>  bD.1708_2 = fD.1704 (5);
>>>>>  D.1710_3 = aD.1707_1 + bD.1708_2;
>>>>>  # VUSE <.MEMD.1712_6>
>>>>>  return D.1710_3;
>>>>> ...
>>>>>
>>>>> The value numbering modification I'm proposing would say that a and b have the
>>>>> same value, which is not true. I updated the patch to ensure in this case that a
>>>>> and b would be seen as different.
>>>>> Note that things won't break if the attribute is missing on a function, since
>>>>> that just means that the function would have a vdef, and there would be no 2
>>>>> subsequent function calls with the same vuse.
>>>>>
>>>>>>> - but that is not what
>>>>>>> matters - because obviously we can still merge the blocks.
>>>>>
>>>>> Right.
>>>>>
>>>>>>>  That is,
>>>>>>> we cannot be sure that the side-effects on memory of bla (5) and bla (5)
>>>>>>> are the same.  But is that not what your value-numbering changes will assert?
>>>>>
>>>>> In the case of calls in different branches of an control flow statement, we can
>>>>> aggressively assume they're the same, since we cannot write a check that would
>>>>> start failing.
>>> Apart from the above ... which shows that even the returned values can matter.
>>>
>>>>> In the case of subsequent calls with side effects, the vuses will never be the same.
>>>>>
>>>>> In the case of subsequent calls without side effects, but not pure or const, we
>>>>> can't assume they have the same result. AFAIK, there's currently no way to
>>>>> specify such a function, but the updated patch should be able handle this.
>>>>>
>>>>>>> (not sure how you achieve this, but it looks like you insert/lookup general
>>>>>>> calls, thus not pure/const ones
>>>>>
>>>>> Correct.
>>>>>
>>>>>>> - that could easily lead to miscompiles(?)).
>>>>>>>
>>> And to increased compile-time (not sure if it matters).
>>>
>>>>> I have bootstrapped and reg-tested the updated (and the previous) patch on
>>>>> x86_64 (ada inclusive), and found no issues.
>>>>>
>>>>> Can you think of a test-case that fails with the previous or the updated patch?
>>> I see you do not handle
>>>
>>
>> I assume you mean 'not handle' in the sense of 'handle conservatively'.
> 
> Yes.
> 
>>> struct S { int i; };
>>> struct S foo (void);
>>> struct S bar (void)
>>> {
>>>   struct S s1, s2;
>>>   if (...)
>>>    s = foo ();
>>>   else
>>>    s = foo ();
>>>
>>> because the calls have a LHS that is not an SSA name.
>>
>> Indeed, the gvn patch handles this example conservatively, and tree-tail-merge
>> fails to optimize this test-case:
>> ...
>> struct S { int i; };
>> extern struct S foo (void);
>> extern int foo2 (void);
>> struct S s;
>> int bar (int c) {
>>  int r;
>>  if (c)
>>    {
>>      s = foo ();
>>      r = foo2 ();
>>    }
>>  else
>>    {
>>      s = foo ();
>>      r = foo2 ();
>>    }
>>  return r;
>> }
>> ...
>>
>> A todo.
>>
>>>  In general
>>> value-numbering virtual operands will change what further consumers lookup.
>>> So changing
>>>
>>>   # MEM_2 = VUSE <MEM_3>
>>>   foo ();
>>>   # VUSE <MEM_2>
>>>   ... = x;
>>>
>>> by value-numbering MEM_2 to MEM_3 will make the lookup of ... = x use
>>> MEM_3, not considering a clobber by foo.
>>
>> AFAIU, the patch doesn't do this type of value numbering.
>>
>>> This might not actually be an
>>> issue (after all we do this for stores, too).
>>>
>>> I see you do not handle
>>>
>>> int bar (int);
>>> void baz (int);
>>> void bla (int);
>>> int s;
>>> void
>>> foo (int y)
>>> {
>>>   int a;
>>>   if (y == 6)
>>>     {
>>>       s = 1;
>>>       bla (5);
>>>       a = bar (7);
>>>     }
>>>   else
>>>     {
>>>       s = 1;
>>>       bla (5);
>>>       a = bar (7);
>>>     }
>>>   baz (a);
>>> }
>>>
>>
>> I think a more basic example is this one from PR52009, which is basically about
>> stores:
>> ...
>> int z;
>>
>> void
>> foo (int y)
>> {
>>  int a;
>>  if (y == 6)
>>    z = 5;
>>  else
>>    z = 5;
>> }
>> ...
>>
>> I proposed a patch for that separately:
>> http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01731.html. Mmmm, looking at the
>> patch now, I suppose I could use part of that patch as infrastructure for the
>> todo above.
> 
> Yes, I think so.
> 
>>> and I don't see how you need the new ref->vdef member
>>
>> I'm regarding the result of the ref as a compound <vdef, result>. Most of the
>> time I could (as I did in an earlier version of the patch) deduce vdef as
>> gimple_vdef (SSA_NAME_DEF_STMT (result)), but not always, in particular when
>> result is NULL_TREE.
>>
>>> - that is also
>>> not handled consistently (not in hashing or comparing, etc.).
>>
>> I handle the vdef as a result.
> 
> Ok.  Can you please rename ->vdef to ->result_vdef then?
> 

Done.

>>> Can't you
>>> always use SSA_VAL (vuse)?
>>
>> I'm interested in storing the vdef, not the vuse.
> 
> I meant that result_vdef will always be SSA_VAL (vuse) (the vdef of
> the stmt defining vuse).  No?
> 

Right.

>>>
>>> Btw, what's
>>>
>>> +  if (vdef)
>>> +    VN_INFO (vdef)->use_processed = true;
>>>
>>> for?  We arrive here by visiting the VDEF after all and visit_use does already
>>> set the use processed.  Is it that we end up visiting the call twice because
>>> of the lhs SSA name definition and the virtual operand definition?
>>
>> Yes.
>>
>>> If so then
>>> visit_use should instead do
>>>
>>> FOR_EACH_SSA_DEF_OPERAND (...)
>>>    VN_INFO (def)->use_processed = true;
>>>
>>> and defs_to_varying then no longer needs to do that.
>>>
>>
>> factored mark_use_processed out of defs_to_varying, calling mark_use_processed
>> at start of visit_use.
> 
> Thanks.  Which case hits SSA_NAME_DEF_STMT == NULL btw?  The
> default-def for the virtual operand?
> 

SSA_NAME_DEF_STMT == NULL is tested by !stmt in mark_use_processed.

I suppose I better use SSA_NAME_IS_DEFAULT_DEF, as is done in visit_use. I'm
still somewhat confused as to what is the difference. I suppose we might have a
nop statement that is a default def, while it's unequal to NULL.

Updated in committed patch.

>>>>> Otherwise, ok for trunk?
>>> Let's give this one more iteration, but I guess the basic idea should work.
>>>
>>
>> bootstrapped and reg-tested on x86_64 (ada inclusive).
>>
>> Is this patch ok, or is the todo required?
> 
> No, you can followup with that.
> 
> Ok with s/vdef/result_vdef/
> 

retested and committed with this change and SSA_NAME_IS_DEFAULT_DEF instead of
'!stmt'.

Thanks.
- Tom



More information about the Gcc-patches mailing list