This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls


On Wed, Apr 25, 2012 at 11:56 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 25/04/12 11:57, Richard Guenther wrote:
> <SNIP>
>>>> >> Hmm. ?I'm not sure we can conclude that they have the same value!
>>>> >>
>>>> >> +int bar (int);
>>>> >> +void baz (int);
>>>> >> +void bla (int);
>>>> >> +
>>>> >> +void
>>>> >> +foo (int y)
>>>> >> +{
>>>> >> + ?int a;
>>>> >> + ?if (y == 6)
>>>> >> + ? ?{
>>>> >> + ? ? ?bla (5);
>>>> >> + ? ? ?a = bar (7);
>>>> >> + ? ?}
>>>> >> + ?else
>>>> >> + ? ?{
>>>> >> + ? ? ?bla (5);
>>>> >> + ? ? ?a = bar (7);
>>>> >> + ? ?}
>>>> >> + ?baz (a);
>>>> >> +}
>>>> >>
>>>> >> I can implement bla as
>>>> >>
>>>> >> void bla(int) { a = random (); }
>>>> >>
>>>> >> thus a would not have the same value on both paths
>>> >
>>> > I think it's hard to decide whether they have the same value or not, since we
>>> > cannot write a test-case that compares the results of calls from 2 branches of
>>> > an if statement.
>> void *foo ()
>> {
>> ? return __builtin_return_address (0);
>> }
>>
>> void *bar (_Bool b)
>> {
>> ? if (b)
>> ? ? return foo ();
>> ? else
>> ? ? return foo ();
>> }
>>
>> int main()
>> {
>> ? if (bar(true) == bar(false))
>> ? ? abort ();
>> }
>>
>> ok ... outside of the scope of standard "C", but we certainly _can_ do this.
>> Which would question tail-merging the above at all, of course.
>>
>
> I added noinline to make sure there's no variation from inlining.
> ...
> extern void abort (void) __attribute__ ((__nothrow__)) __attribute__
> ((__noreturn__));
>
> __attribute__((noinline))
> void *foo ()
> {
> ?return __builtin_return_address (0);
> }
>
> __attribute__((noinline))
> void *bar (int b)
> {
> ?if (b)
> ? ?return foo ();
> ?else
> ? ?return foo ();
> }
>
> int main()
> {
> ?void *a, *b;
> ?a = bar (1);
> ?b = bar (0);
> ?if (a == b)
> ? ?abort ();
> ?return 0;
> }
> ...
>
> This test-case passes with:
> ...
> gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge -fno-crossjumping
> ...
>
> and fails with:
> ...
> gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge -fcrossjumping
> ...
>
> with the proposed patch, it also fails with:
> ...
> gcc -O2 calls.c -fno-optimize-sibling-calls -ftree-tail-merge -fno-crossjumping
> ...
>
> Tree-tail-merge performs the same transformation as cross-jumping for this
> test-case. I think that the transformation is valid, and that the test-case
> behaves as expected. Subsequent calls to foo may or may not return the same
> value, depending on the optimizations, we can't assert a != b, that's just the
> nature of __builtin_return_address.
>
> Btw, this example is not what I meant when I said 'a test-case that compares the
> results of calls from 2 branches of an if statement'. I tried to say that the
> semantics of C is defined in terms of taking either 1 branch or the other,
> rather than executing both in parallel, after which both results would be
> available. From that point of view, this example compares subsequent calls to
> foo, not parallel.

Ah, I see.  Btw, I was not suggesting that we should not optimize the above.

>>> > I started looking at the problem in terms of subsequent calls. Consider the
>>> > imaginary attribute nosideeffect, similar to const and pure.
>>> >
>>> > const:
>>> > - no effects except the return value
>>> > - the return value depends only on the parameters
>>> >
>>> > pure:
>>> > - no effects except the return value
>>> > - the return value depends only on the parameters and/or global variables
>>> >
>>> > nosideeffect:
>>> > - no effects except the return value
>>> >
>>> > The code fragment:
>>> > ...
>>> > extern int f (int) __attribute ((nosideeffect));
>>> > int g()
>>> > {
>>> > ?int a;
>>> > ?int b;
>>> > ?a = f (5);
>>> > ?b = f (5);
>>> > ?return a + b;
>>> > }
>>> > ...
>>> >
>>> > would result in a gimple sequence more or less like this:
>>> > ...
>>> > ?# VUSE <.MEMD.1712_4(D)>
>>> > ?aD.1707_1 = fD.1704 (5);
>>> > ?# VUSE <.MEMD.1712_4(D)>
>>> > ?bD.1708_2 = fD.1704 (5);
>>> > ?D.1710_3 = aD.1707_1 + bD.1708_2;
>>> > ?# VUSE <.MEMD.1712_6>
>>> > ?return D.1710_3;
>>> > ...
>>> >
>>> > The value numbering modification I'm proposing would say that a and b have the
>>> > same value, which is not true. I updated the patch to ensure in this case that a
>>> > and b would be seen as different.
>>> > Note that things won't break if the attribute is missing on a function, since
>>> > that just means that the function would have a vdef, and there would be no 2
>>> > subsequent function calls with the same vuse.
>>> >
>>>> >> - but that is not what
>>>> >> matters - because obviously we can still merge the blocks.
>>> >
>>> > Right.
>>> >
>>>> >> ?That is,
>>>> >> we cannot be sure that the side-effects on memory of bla (5) and bla (5)
>>>> >> are the same. ?But is that not what your value-numbering changes will assert?
>>> >
>>> > In the case of calls in different branches of an control flow statement, we can
>>> > aggressively assume they're the same, since we cannot write a check that would
>>> > start failing.
>> Apart from the above ... which shows that even the returned values can matter.
>>
>>> > In the case of subsequent calls with side effects, the vuses will never be the same.
>>> >
>>> > In the case of subsequent calls without side effects, but not pure or const, we
>>> > can't assume they have the same result. AFAIK, there's currently no way to
>>> > specify such a function, but the updated patch should be able handle this.
>>> >
>>>> >> (not sure how you achieve this, but it looks like you insert/lookup general
>>>> >> calls, thus not pure/const ones
>>> >
>>> > Correct.
>>> >
>>>> >> - that could easily lead to miscompiles(?)).
>>>> >>
>> And to increased compile-time (not sure if it matters).
>>
>>> > I have bootstrapped and reg-tested the updated (and the previous) patch on
>>> > x86_64 (ada inclusive), and found no issues.
>>> >
>>> > Can you think of a test-case that fails with the previous or the updated patch?
>> I see you do not handle
>>
>
> I assume you mean 'not handle' in the sense of 'handle conservatively'.

Yes.

>> struct S { int i; };
>> struct S foo (void);
>> struct S bar (void)
>> {
>> ? struct S s1, s2;
>> ? if (...)
>> ? ?s = foo ();
>> ? else
>> ? ?s = foo ();
>>
>> because the calls have a LHS that is not an SSA name.
>
> Indeed, the gvn patch handles this example conservatively, and tree-tail-merge
> fails to optimize this test-case:
> ...
> struct S { int i; };
> extern struct S foo (void);
> extern int foo2 (void);
> struct S s;
> int bar (int c) {
> ?int r;
> ?if (c)
> ? ?{
> ? ? ?s = foo ();
> ? ? ?r = foo2 ();
> ? ?}
> ?else
> ? ?{
> ? ? ?s = foo ();
> ? ? ?r = foo2 ();
> ? ?}
> ?return r;
> }
> ...
>
> A todo.
>
>> ?In general
>> value-numbering virtual operands will change what further consumers lookup.
>> So changing
>>
>> ? # MEM_2 = VUSE <MEM_3>
>> ? foo ();
>> ? # VUSE <MEM_2>
>> ? ... = x;
>>
>> by value-numbering MEM_2 to MEM_3 will make the lookup of ... = x use
>> MEM_3, not considering a clobber by foo.
>
> AFAIU, the patch doesn't do this type of value numbering.
>
>> This might not actually be an
>> issue (after all we do this for stores, too).
>>
>> I see you do not handle
>>
>> int bar (int);
>> void baz (int);
>> void bla (int);
>> int s;
>> void
>> foo (int y)
>> {
>> ? int a;
>> ? if (y == 6)
>> ? ? {
>> ? ? ? s = 1;
>> ? ? ? bla (5);
>> ? ? ? a = bar (7);
>> ? ? }
>> ? else
>> ? ? {
>> ? ? ? s = 1;
>> ? ? ? bla (5);
>> ? ? ? a = bar (7);
>> ? ? }
>> ? baz (a);
>> }
>>
>
> I think a more basic example is this one from PR52009, which is basically about
> stores:
> ...
> int z;
>
> void
> foo (int y)
> {
> ?int a;
> ?if (y == 6)
> ? ?z = 5;
> ?else
> ? ?z = 5;
> }
> ...
>
> I proposed a patch for that separately:
> http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01731.html. Mmmm, looking at the
> patch now, I suppose I could use part of that patch as infrastructure for the
> todo above.

Yes, I think so.

>> and I don't see how you need the new ref->vdef member
>
> I'm regarding the result of the ref as a compound <vdef, result>. Most of the
> time I could (as I did in an earlier version of the patch) deduce vdef as
> gimple_vdef (SSA_NAME_DEF_STMT (result)), but not always, in particular when
> result is NULL_TREE.
>
>> - that is also
>> not handled consistently (not in hashing or comparing, etc.).
>
> I handle the vdef as a result.

Ok.  Can you please rename ->vdef to ->result_vdef then?

>> Can't you
>> always use SSA_VAL (vuse)?
>
> I'm interested in storing the vdef, not the vuse.

I meant that result_vdef will always be SSA_VAL (vuse) (the vdef of
the stmt defining vuse).  No?

>>
>> Btw, what's
>>
>> + ?if (vdef)
>> + ? ?VN_INFO (vdef)->use_processed = true;
>>
>> for? ?We arrive here by visiting the VDEF after all and visit_use does already
>> set the use processed. ?Is it that we end up visiting the call twice because
>> of the lhs SSA name definition and the virtual operand definition?
>
> Yes.
>
>> If so then
>> visit_use should instead do
>>
>> FOR_EACH_SSA_DEF_OPERAND (...)
>> ? ?VN_INFO (def)->use_processed = true;
>>
>> and defs_to_varying then no longer needs to do that.
>>
>
> factored mark_use_processed out of defs_to_varying, calling mark_use_processed
> at start of visit_use.

Thanks.  Which case hits SSA_NAME_DEF_STMT == NULL btw?  The
default-def for the virtual operand?

>>> > Otherwise, ok for trunk?
>> Let's give this one more iteration, but I guess the basic idea should work.
>>
>
> bootstrapped and reg-tested on x86_64 (ada inclusive).
>
> Is this patch ok, or is the todo required?

No, you can followup with that.

Ok with s/vdef/result_vdef/

Thanks,
Richard.

> Thanks,
> ?- Tom
>
> 2012-04-25 ?Tom de Vries ?<tom@codesourcery.com>
>
> ? ? ? ?PR tree-optimization/51879
> ? ? ? ?* tree-ssa-sccvn.h (struct vn_reference_s): Add vdef field.
> ? ? ? ?* tree-ssa-sccvn.c (mark_use_processed): New function, factored out
> ? ? ? ?of ...
> ? ? ? ?(defs_to_varying): ... here.
> ? ? ? ?(visit_reference_op_call): Handle gimple_vdef.
> ? ? ? ?Handle case that lhs is NULL_TREE.
> ? ? ? ?(visit_use): Use mark_use_processed. ?Handle calls with side-effect
> ? ? ? ?using visit_reference_op_call.
>
> ? ? ? ?gcc.dg/pr51879.c: New test.
> ? ? ? ?gcc.dg/pr51879-2.c: Same.
> ? ? ? ?gcc.dg/pr51879-3.c: Same.
> ? ? ? ?gcc.dg/pr51879-4.c: Same.
> ? ? ? ?gcc.dg/pr51879-6.c: Same.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]