This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls


On Fri, Apr 27, 2012 at 8:20 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 26/04/12 12:20, Richard Guenther wrote:
>> On Wed, Apr 25, 2012 at 11:56 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>> On 25/04/12 11:57, Richard Guenther wrote:
>>> <SNIP>
>>>>>>>> Hmm. ?I'm not sure we can conclude that they have the same value!
>>>>>>>>
>>>>>>>> +int bar (int);
>>>>>>>> +void baz (int);
>>>>>>>> +void bla (int);
>>>>>>>> +
>>>>>>>> +void
>>>>>>>> +foo (int y)
>>>>>>>> +{
>>>>>>>> + ?int a;
>>>>>>>> + ?if (y == 6)
>>>>>>>> + ? ?{
>>>>>>>> + ? ? ?bla (5);
>>>>>>>> + ? ? ?a = bar (7);
>>>>>>>> + ? ?}
>>>>>>>> + ?else
>>>>>>>> + ? ?{
>>>>>>>> + ? ? ?bla (5);
>>>>>>>> + ? ? ?a = bar (7);
>>>>>>>> + ? ?}
>>>>>>>> + ?baz (a);
>>>>>>>> +}
>>>>>>>>
>>>>>>>> I can implement bla as
>>>>>>>>
>>>>>>>> void bla(int) { a = random (); }
>>>>>>>>
>>>>>>>> thus a would not have the same value on both paths
>>>>>>
>>>>>> I think it's hard to decide whether they have the same value or not, since we
>>>>>> cannot write a test-case that compares the results of calls from 2 branches of
>>>>>> an if statement.
>>>> void *foo ()
>>>> {
>>>> ? return __builtin_return_address (0);
>>>> }
>>>>
>>>> void *bar (_Bool b)
>>>> {
>>>> ? if (b)
>>>> ? ? return foo ();
>>>> ? else
>>>> ? ? return foo ();
>>>> }
>>>>
>>>> int main()
>>>> {
>>>> ? if (bar(true) == bar(false))
>>>> ? ? abort ();
>>>> }
>>>>
>>>> ok ... outside of the scope of standard "C", but we certainly _can_ do this.
>>>> Which would question tail-merging the above at all, of course.
>>>>
>>>
>>> I added noinline to make sure there's no variation from inlining.
>>> ...
>>> extern void abort (void) __attribute__ ((__nothrow__)) __attribute__
>>> ((__noreturn__));
>>>
>>> __attribute__((noinline))
>>> void *foo ()
>>> {
>>> ?return __builtin_return_address (0);
>>> }
>>>
>>> __attribute__((noinline))
>>> void *bar (int b)
>>> {
>>> ?if (b)
>>> ? ?return foo ();
>>> ?else
>>> ? ?return foo ();
>>> }
>>>
>>> int main()
>>> {
>>> ?void *a, *b;
>>> ?a = bar (1);
>>> ?b = bar (0);
>>> ?if (a == b)
>>> ? ?abort ();
>>> ?return 0;
>>> }
>>> ...
>>>
>>> This test-case passes with:
>>> ...
>>> gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge -fno-crossjumping
>>> ...
>>>
>>> and fails with:
>>> ...
>>> gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge -fcrossjumping
>>> ...
>>>
>>> with the proposed patch, it also fails with:
>>> ...
>>> gcc -O2 calls.c -fno-optimize-sibling-calls -ftree-tail-merge -fno-crossjumping
>>> ...
>>>
>>> Tree-tail-merge performs the same transformation as cross-jumping for this
>>> test-case. I think that the transformation is valid, and that the test-case
>>> behaves as expected. Subsequent calls to foo may or may not return the same
>>> value, depending on the optimizations, we can't assert a != b, that's just the
>>> nature of __builtin_return_address.
>>>
>>> Btw, this example is not what I meant when I said 'a test-case that compares the
>>> results of calls from 2 branches of an if statement'. I tried to say that the
>>> semantics of C is defined in terms of taking either 1 branch or the other,
>>> rather than executing both in parallel, after which both results would be
>>> available. From that point of view, this example compares subsequent calls to
>>> foo, not parallel.
>>
>> Ah, I see. ?Btw, I was not suggesting that we should not optimize the above.
>>
>>>>>> I started looking at the problem in terms of subsequent calls. Consider the
>>>>>> imaginary attribute nosideeffect, similar to const and pure.
>>>>>>
>>>>>> const:
>>>>>> - no effects except the return value
>>>>>> - the return value depends only on the parameters
>>>>>>
>>>>>> pure:
>>>>>> - no effects except the return value
>>>>>> - the return value depends only on the parameters and/or global variables
>>>>>>
>>>>>> nosideeffect:
>>>>>> - no effects except the return value
>>>>>>
>>>>>> The code fragment:
>>>>>> ...
>>>>>> extern int f (int) __attribute ((nosideeffect));
>>>>>> int g()
>>>>>> {
>>>>>> ?int a;
>>>>>> ?int b;
>>>>>> ?a = f (5);
>>>>>> ?b = f (5);
>>>>>> ?return a + b;
>>>>>> }
>>>>>> ...
>>>>>>
>>>>>> would result in a gimple sequence more or less like this:
>>>>>> ...
>>>>>> ?# VUSE <.MEMD.1712_4(D)>
>>>>>> ?aD.1707_1 = fD.1704 (5);
>>>>>> ?# VUSE <.MEMD.1712_4(D)>
>>>>>> ?bD.1708_2 = fD.1704 (5);
>>>>>> ?D.1710_3 = aD.1707_1 + bD.1708_2;
>>>>>> ?# VUSE <.MEMD.1712_6>
>>>>>> ?return D.1710_3;
>>>>>> ...
>>>>>>
>>>>>> The value numbering modification I'm proposing would say that a and b have the
>>>>>> same value, which is not true. I updated the patch to ensure in this case that a
>>>>>> and b would be seen as different.
>>>>>> Note that things won't break if the attribute is missing on a function, since
>>>>>> that just means that the function would have a vdef, and there would be no 2
>>>>>> subsequent function calls with the same vuse.
>>>>>>
>>>>>>>> - but that is not what
>>>>>>>> matters - because obviously we can still merge the blocks.
>>>>>>
>>>>>> Right.
>>>>>>
>>>>>>>> ?That is,
>>>>>>>> we cannot be sure that the side-effects on memory of bla (5) and bla (5)
>>>>>>>> are the same. ?But is that not what your value-numbering changes will assert?
>>>>>>
>>>>>> In the case of calls in different branches of an control flow statement, we can
>>>>>> aggressively assume they're the same, since we cannot write a check that would
>>>>>> start failing.
>>>> Apart from the above ... which shows that even the returned values can matter.
>>>>
>>>>>> In the case of subsequent calls with side effects, the vuses will never be the same.
>>>>>>
>>>>>> In the case of subsequent calls without side effects, but not pure or const, we
>>>>>> can't assume they have the same result. AFAIK, there's currently no way to
>>>>>> specify such a function, but the updated patch should be able handle this.
>>>>>>
>>>>>>>> (not sure how you achieve this, but it looks like you insert/lookup general
>>>>>>>> calls, thus not pure/const ones
>>>>>>
>>>>>> Correct.
>>>>>>
>>>>>>>> - that could easily lead to miscompiles(?)).
>>>>>>>>
>>>> And to increased compile-time (not sure if it matters).
>>>>
>>>>>> I have bootstrapped and reg-tested the updated (and the previous) patch on
>>>>>> x86_64 (ada inclusive), and found no issues.
>>>>>>
>>>>>> Can you think of a test-case that fails with the previous or the updated patch?
>>>> I see you do not handle
>>>>
>>>
>>> I assume you mean 'not handle' in the sense of 'handle conservatively'.
>>
>> Yes.
>>
>>>> struct S { int i; };
>>>> struct S foo (void);
>>>> struct S bar (void)
>>>> {
>>>> ? struct S s1, s2;
>>>> ? if (...)
>>>> ? ?s = foo ();
>>>> ? else
>>>> ? ?s = foo ();
>>>>
>>>> because the calls have a LHS that is not an SSA name.
>>>
>>> Indeed, the gvn patch handles this example conservatively, and tree-tail-merge
>>> fails to optimize this test-case:
>>> ...
>>> struct S { int i; };
>>> extern struct S foo (void);
>>> extern int foo2 (void);
>>> struct S s;
>>> int bar (int c) {
>>> ?int r;
>>> ?if (c)
>>> ? ?{
>>> ? ? ?s = foo ();
>>> ? ? ?r = foo2 ();
>>> ? ?}
>>> ?else
>>> ? ?{
>>> ? ? ?s = foo ();
>>> ? ? ?r = foo2 ();
>>> ? ?}
>>> ?return r;
>>> }
>>> ...
>>>
>>> A todo.
>>>
>>>> ?In general
>>>> value-numbering virtual operands will change what further consumers lookup.
>>>> So changing
>>>>
>>>> ? # MEM_2 = VUSE <MEM_3>
>>>> ? foo ();
>>>> ? # VUSE <MEM_2>
>>>> ? ... = x;
>>>>
>>>> by value-numbering MEM_2 to MEM_3 will make the lookup of ... = x use
>>>> MEM_3, not considering a clobber by foo.
>>>
>>> AFAIU, the patch doesn't do this type of value numbering.
>>>
>>>> This might not actually be an
>>>> issue (after all we do this for stores, too).
>>>>
>>>> I see you do not handle
>>>>
>>>> int bar (int);
>>>> void baz (int);
>>>> void bla (int);
>>>> int s;
>>>> void
>>>> foo (int y)
>>>> {
>>>> ? int a;
>>>> ? if (y == 6)
>>>> ? ? {
>>>> ? ? ? s = 1;
>>>> ? ? ? bla (5);
>>>> ? ? ? a = bar (7);
>>>> ? ? }
>>>> ? else
>>>> ? ? {
>>>> ? ? ? s = 1;
>>>> ? ? ? bla (5);
>>>> ? ? ? a = bar (7);
>>>> ? ? }
>>>> ? baz (a);
>>>> }
>>>>
>>>
>>> I think a more basic example is this one from PR52009, which is basically about
>>> stores:
>>> ...
>>> int z;
>>>
>>> void
>>> foo (int y)
>>> {
>>> ?int a;
>>> ?if (y == 6)
>>> ? ?z = 5;
>>> ?else
>>> ? ?z = 5;
>>> }
>>> ...
>>>
>>> I proposed a patch for that separately:
>>> http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01731.html. Mmmm, looking at the
>>> patch now, I suppose I could use part of that patch as infrastructure for the
>>> todo above.
>>
>> Yes, I think so.
>>
>>>> and I don't see how you need the new ref->vdef member
>>>
>>> I'm regarding the result of the ref as a compound <vdef, result>. Most of the
>>> time I could (as I did in an earlier version of the patch) deduce vdef as
>>> gimple_vdef (SSA_NAME_DEF_STMT (result)), but not always, in particular when
>>> result is NULL_TREE.
>>>
>>>> - that is also
>>>> not handled consistently (not in hashing or comparing, etc.).
>>>
>>> I handle the vdef as a result.
>>
>> Ok. ?Can you please rename ->vdef to ->result_vdef then?
>>
>
> Done.
>
>>>> Can't you
>>>> always use SSA_VAL (vuse)?
>>>
>>> I'm interested in storing the vdef, not the vuse.
>>
>> I meant that result_vdef will always be SSA_VAL (vuse) (the vdef of
>> the stmt defining vuse). ?No?
>>
>
> Right.
>
>>>>
>>>> Btw, what's
>>>>
>>>> + ?if (vdef)
>>>> + ? ?VN_INFO (vdef)->use_processed = true;
>>>>
>>>> for? ?We arrive here by visiting the VDEF after all and visit_use does already
>>>> set the use processed. ?Is it that we end up visiting the call twice because
>>>> of the lhs SSA name definition and the virtual operand definition?
>>>
>>> Yes.
>>>
>>>> If so then
>>>> visit_use should instead do
>>>>
>>>> FOR_EACH_SSA_DEF_OPERAND (...)
>>>> ? ?VN_INFO (def)->use_processed = true;
>>>>
>>>> and defs_to_varying then no longer needs to do that.
>>>>
>>>
>>> factored mark_use_processed out of defs_to_varying, calling mark_use_processed
>>> at start of visit_use.
>>
>> Thanks. ?Which case hits SSA_NAME_DEF_STMT == NULL btw? ?The
>> default-def for the virtual operand?
>>
>
> SSA_NAME_DEF_STMT == NULL is tested by !stmt in mark_use_processed.
>
> I suppose I better use SSA_NAME_IS_DEFAULT_DEF, as is done in visit_use. I'm
> still somewhat confused as to what is the difference. I suppose we might have a
> nop statement that is a default def, while it's unequal to NULL.

All default defs have a GIMPLE_NOP as definition, so I was wondering for
which case you see a NULL SSA_NAME_DEF_STMT (and only came up
with a virtual operand maybe?)

> Updated in committed patch.
>
>>>>>> Otherwise, ok for trunk?
>>>> Let's give this one more iteration, but I guess the basic idea should work.
>>>>
>>>
>>> bootstrapped and reg-tested on x86_64 (ada inclusive).
>>>
>>> Is this patch ok, or is the todo required?
>>
>> No, you can followup with that.
>>
>> Ok with s/vdef/result_vdef/
>>
>
> retested and committed with this change and SSA_NAME_IS_DEFAULT_DEF instead of
> '!stmt'.

Thanks,
Richard.

> Thanks.
> - Tom


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]