This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls


On Tue, Apr 24, 2012 at 11:19 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 17/04/12 14:24, Richard Guenther wrote:
>> On Sat, Apr 14, 2012 at 9:26 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>> On 27/01/12 21:37, Tom de Vries wrote:
>>>> On 24/01/12 11:40, Richard Guenther wrote:
>>>>> On Mon, Jan 23, 2012 at 10:27 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>>>> Richard,
>>>>>> Jakub,
>>>>>>
>>>>>> the following patch fixes PR51879.
>>>>>>
>>>>>> Consider the following test-case:
>>>>>> ...
>>>>>> int bar (int);
>>>>>> void baz (int);
>>>>>>
>>>>>> void
>>>>>> foo (int y)
>>>>>> {
>>>>>> ?int a;
>>>>>> ?if (y == 6)
>>>>>> ? ?a = bar (7);
>>>>>> ?else
>>>>>> ? ?a = bar (7);
>>>>>> ?baz (a);
>>>>>> }
>>>>>> ...
>>>>>>
>>>>>> after compiling at -02, the representation looks like this before tail-merging:
>>>>>> ...
>>>>>> ?# BLOCK 3 freq:1991
>>>>>> ?# PRED: 2 [19.9%] ?(true,exec)
>>>>>> ?# .MEMD.1714_7 = VDEF <.MEMD.1714_6(D)>
>>>>>> ?# USE = nonlocal
>>>>>> ?# CLB = nonlocal
>>>>>> ?aD.1709_3 = barD.1703 (7);
>>>>>> ?goto <bb 5>;
>>>>>> ?# SUCC: 5 [100.0%] ?(fallthru,exec)
>>>>>>
>>>>>> ?# BLOCK 4 freq:8009
>>>>>> ?# PRED: 2 [80.1%] ?(false,exec)
>>>>>> ?# .MEMD.1714_8 = VDEF <.MEMD.1714_6(D)>
>>>>>> ?# USE = nonlocal
>>>>>> ?# CLB = nonlocal
>>>>>> ?aD.1709_4 = barD.1703 (7);
>>>>>> ?# SUCC: 5 [100.0%] ?(fallthru,exec)
>>>>>>
>>>>>> ?# BLOCK 5 freq:10000
>>>>>> ?# PRED: 3 [100.0%] ?(fallthru,exec) 4 [100.0%] ?(fallthru,exec)
>>>>>> ?# aD.1709_1 = PHI <aD.1709_3(3), aD.1709_4(4)>
>>>>>> ?# .MEMD.1714_5 = PHI <.MEMD.1714_7(3), .MEMD.1714_8(4)>
>>>>>> ?# .MEMD.1714_9 = VDEF <.MEMD.1714_5>
>>>>>> ?# USE = nonlocal
>>>>>> ?# CLB = nonlocal
>>>>>> ?bazD.1705 (aD.1709_1);
>>>>>> ?# VUSE <.MEMD.1714_9>
>>>>>> ?return;
>>>>>> ...
>>>>>>
>>>>>> the patch allows aD.1709_4 to be value numbered to aD.1709_3, and .MEMD.1714_8
>>>>>> to .MEMD.1714_7, which enables tail-merging of blocks 4 and 5.
>>>>>>
>>>>>> The patch makes sure non-const/pure call results (gimple_vdef and
>>>>>> gimple_call_lhs) are properly value numbered.
>>>>>>
>>>>>> Bootstrapped and reg-tested on x86_64.
>>>>>>
>>>>>> ok for stage1?
>>>>>
>>>>> The following cannot really work:
>>>>>
>>>>> @@ -2600,7 +2601,11 @@ visit_reference_op_call (tree lhs, gimpl
>>>>> ? ?result = vn_reference_lookup_1 (&vr1, NULL);
>>>>> ? ?if (result)
>>>>> ? ? ?{
>>>>> - ? ? ?changed = set_ssa_val_to (lhs, result);
>>>>> + ? ? ?tree result_vdef = gimple_vdef (SSA_NAME_DEF_STMT (result));
>>>>> + ? ? ?if (vdef)
>>>>> + ? ? ? changed |= set_ssa_val_to (vdef, result_vdef);
>>>>> + ? ? ?changed |= set_ssa_val_to (lhs, result);
>>>>>
>>>>> because 'result' may be not an SSA name. ?It might also not have
>>>>> a proper definition statement (if VN_INFO (result)->needs_insertion
>>>>> is true). ?So you at least need to guard things properly.
>>>>>
>>>>
>>>> Right. And that also doesn't work if the function is without lhs, such as in the
>>>> new test-case pr51879-6.c.
>>>>
>>>> I fixed this by storing both lhs and vdef, such that I don't have to derive
>>>> the vdef from the lhs.
>>>>
>>>>> (On a side-note - I _did_ want to remove value-numbering virtual operands
>>>>> at some point ...)
>>>>>
>>>>
>>>> Doing so willl hurt performance of tail-merging in its current form.
>>>> OTOH, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51964#c0 shows that
>>>> value numbering as used in tail-merging has its limitations too.
>>>> Do you have any ideas how to address that one?
>>>>
>>>>> @@ -3359,8 +3366,10 @@ visit_use (tree use)
>>>>> ? ? ? ? ? /* ??? ?We should handle stores from calls. ?*/
>>>>> ? ? ? ? ? else if (TREE_CODE (lhs) == SSA_NAME)
>>>>> ? ? ? ? ? ? {
>>>>> + ? ? ? ? ? ? tree vuse = gimple_vuse (stmt);
>>>>> ? ? ? ? ? ? ? if (!gimple_call_internal_p (stmt)
>>>>> - ? ? ? ? ? ? ? ? && gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST))
>>>>> + ? ? ? ? ? ? ? ? && (gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST)
>>>>> + ? ? ? ? ? ? ? ? ? ? || (vuse && SSA_VAL (vuse) != VN_TOP)))
>>>>> ? ? ? ? ? ? ? ? changed = visit_reference_op_call (lhs, stmt);
>>>>> ? ? ? ? ? ? ? else
>>>>> ? ? ? ? ? ? ? ? changed = defs_to_varying (stmt);
>>>>>
>>>>> ... exactly because of the issue that a stmt has multiple defs. ?Btw,
>>>>> vuse should have been visited here or be part of our SCC, so, why do
>>>>> you need this check?
>>>>>
>>>>
>>>> Removed now, that was a workaround for a bug in an earlier version of the patch,
>>>> that I didn't need anymore.
>>>>
>>>> Bootstrapped and reg-tested on x86_64.
>>>>
>>>> OK for stage1?
>>>>
>>>
>>> Richard,
>>>
>>> quoting you in http://gcc.gnu.org/ml/gcc-patches/2012-02/msg00618.html:
>>> ...
>>> I think these fixes hint at that we should
>>> use "structural" equality as fallback if value-numbering doesn't equate
>>> two stmt effects. ?Thus, treat two stmts with exactly the same operands
>>> and flags as equal and using value-numbering to canonicalize operands
>>> (when they are SSA names) for that comparison, or use VN entirely
>>> if there are no side-effects on the stmt.
>>>
>>> Changing value-numbering of virtual operands, even if it looks correct in the
>>> simple cases you change, doesn't look like a general solution for the missed
>>> tail merging opportunities.
>>> ...
>>>
>>> The test-case pr51879-6.c shows a case where improving value numbering will help
>>> tail-merging, but structural equality comparison not:
>>> ...
>>> ?# BLOCK 3 freq:1991
>>> ?# PRED: 2 [19.9%] ?(true,exec)
>>> ?# .MEMD.1717_7 = VDEF <.MEMD.1717_6(D)>
>>> ?# USE = nonlocal
>>> ?# CLB = nonlocal
>>> ?blaD.1708 (5);
>>> ?# .MEMD.1717_8 = VDEF <.MEMD.1717_7>
>>> ?# USE = nonlocal
>>> ?# CLB = nonlocal
>>> ?aD.1712_3 = barD.1704 (7);
>>> ?goto <bb 5>;
>>> ?# SUCC: 5 [100.0%] ?(fallthru,exec)
>>>
>>> ?# BLOCK 4 freq:8009
>>> ?# PRED: 2 [80.1%] ?(false,exec)
>>> ?# .MEMD.1717_9 = VDEF <.MEMD.1717_6(D)>
>>> ?# USE = nonlocal
>>> ?# CLB = nonlocal
>>> ?blaD.1708 (5);
>>> ?# .MEMD.1717_10 = VDEF <.MEMD.1717_9>
>>> ?# USE = nonlocal
>>> ?# CLB = nonlocal
>>> ?aD.1712_4 = barD.1704 (7);
>>> ?# SUCC: 5 [100.0%] ?(fallthru,exec)
>>>
>>> ?# BLOCK 5 freq:10000
>>> ?# PRED: 3 [100.0%] ?(fallthru,exec) 4 [100.0%] ?(fallthru,exec)
>>> ?# aD.1712_1 = PHI <aD.1712_3(3), aD.1712_4(4)>
>>> ?# .MEMD.1717_5 = PHI <.MEMD.1717_8(3), .MEMD.1717_10(4)>
>>> ?# .MEMD.1717_11 = VDEF <.MEMD.1717_5>
>>> ?# USE = nonlocal
>>> ?# CLB = nonlocal
>>> ?bazD.1706 (aD.1712_1);
>>> ?# VUSE <.MEMD.1717_11>
>>> ?return;
>>> ...
>>> by structurally comparing the gimples in both blocks, we can see that these are
>>> equal.
>>> What we can't see, is that aD.1712_3 and aD.1712_4 have the same value. For
>>> that, we need value numbering to number the vuses the same. And if we get value
>>> numbering to number these values equal, we don't need the structural comparison.
>>>
>>> In this case structural comparison cannot be used as fallback for
>>> value-numbering. So, ok for trunk?
>>>
>>> And if not ok, do you have a suggestion of how to deal with this otherwise?
>>
>> Hmm. ?I'm not sure we can conclude that they have the same value!
>>
>> +int bar (int);
>> +void baz (int);
>> +void bla (int);
>> +
>> +void
>> +foo (int y)
>> +{
>> + ?int a;
>> + ?if (y == 6)
>> + ? ?{
>> + ? ? ?bla (5);
>> + ? ? ?a = bar (7);
>> + ? ?}
>> + ?else
>> + ? ?{
>> + ? ? ?bla (5);
>> + ? ? ?a = bar (7);
>> + ? ?}
>> + ?baz (a);
>> +}
>>
>> I can implement bla as
>>
>> void bla(int) { a = random (); }
>>
>> thus a would not have the same value on both paths
>
> I think it's hard to decide whether they have the same value or not, since we
> cannot write a test-case that compares the results of calls from 2 branches of
> an if statement.

void *foo ()
{
  return __builtin_return_address (0);
}

void *bar (_Bool b)
{
  if (b)
    return foo ();
  else
    return foo ();
}

int main()
{
  if (bar(true) == bar(false))
    abort ();
}

ok ... outside of the scope of standard "C", but we certainly _can_ do this.
Which would question tail-merging the above at all, of course.

> I started looking at the problem in terms of subsequent calls. Consider the
> imaginary attribute nosideeffect, similar to const and pure.
>
> const:
> - no effects except the return value
> - the return value depends only on the parameters
>
> pure:
> - no effects except the return value
> - the return value depends only on the parameters and/or global variables
>
> nosideeffect:
> - no effects except the return value
>
> The code fragment:
> ...
> extern int f (int) __attribute ((nosideeffect));
> int g()
> {
> ?int a;
> ?int b;
> ?a = f (5);
> ?b = f (5);
> ?return a + b;
> }
> ...
>
> would result in a gimple sequence more or less like this:
> ...
> ?# VUSE <.MEMD.1712_4(D)>
> ?aD.1707_1 = fD.1704 (5);
> ?# VUSE <.MEMD.1712_4(D)>
> ?bD.1708_2 = fD.1704 (5);
> ?D.1710_3 = aD.1707_1 + bD.1708_2;
> ?# VUSE <.MEMD.1712_6>
> ?return D.1710_3;
> ...
>
> The value numbering modification I'm proposing would say that a and b have the
> same value, which is not true. I updated the patch to ensure in this case that a
> and b would be seen as different.
> Note that things won't break if the attribute is missing on a function, since
> that just means that the function would have a vdef, and there would be no 2
> subsequent function calls with the same vuse.
>
>> - but that is not what
>> matters - because obviously we can still merge the blocks.
>
> Right.
>
>> ?That is,
>> we cannot be sure that the side-effects on memory of bla (5) and bla (5)
>> are the same. ?But is that not what your value-numbering changes will assert?
>
> In the case of calls in different branches of an control flow statement, we can
> aggressively assume they're the same, since we cannot write a check that would
> start failing.

Apart from the above ... which shows that even the returned values can matter.

> In the case of subsequent calls with side effects, the vuses will never be the same.
>
> In the case of subsequent calls without side effects, but not pure or const, we
> can't assume they have the same result. AFAIK, there's currently no way to
> specify such a function, but the updated patch should be able handle this.
>
>> (not sure how you achieve this, but it looks like you insert/lookup general
>> calls, thus not pure/const ones
>
> Correct.
>
>> - that could easily lead to miscompiles(?)).
>>

And to increased compile-time (not sure if it matters).

> I have bootstrapped and reg-tested the updated (and the previous) patch on
> x86_64 (ada inclusive), and found no issues.
>
> Can you think of a test-case that fails with the previous or the updated patch?

I see you do not handle

struct S { int i; };
struct S foo (void);
struct S bar (void)
{
  struct S s1, s2;
  if (...)
   s = foo ();
  else
   s = foo ();

because the calls have a LHS that is not an SSA name.  In general
value-numbering virtual operands will change what further consumers lookup.
So changing

  # MEM_2 = VUSE <MEM_3>
  foo ();
  # VUSE <MEM_2>
  ... = x;

by value-numbering MEM_2 to MEM_3 will make the lookup of ... = x use
MEM_3, not considering a clobber by foo.  This might not actually be an
issue (after all we do this for stores, too).

I see you do not handle

int bar (int);
void baz (int);
void bla (int);
int s;
void
foo (int y)
{
  int a;
  if (y == 6)
    {
      s = 1;
      bla (5);
      a = bar (7);
    }
  else
    {
      s = 1;
      bla (5);
      a = bar (7);
    }
  baz (a);
}

and I don't see how you need the new ref->vdef member - that is also
not handled consistently (not in hashing or comparing, etc.).  Can't you
always use SSA_VAL (vuse)?

Btw, what's

+  if (vdef)
+    VN_INFO (vdef)->use_processed = true;

for?  We arrive here by visiting the VDEF after all and visit_use does already
set the use processed.  Is it that we end up visiting the call twice because
of the lhs SSA name definition and the virtual operand definition?  If so then
visit_use should instead do

FOR_EACH_SSA_DEF_OPERAND (...)
   VN_INFO (def)->use_processed = true;

and defs_to_varying then no longer needs to do that.

> Otherwise, ok for trunk?

Let's give this one more iteration, but I guess the basic idea should work.

Thanks,
Richard.

> Thanks,
> - Tom
>
>> Richard.
>
> 2012-04-24 ?Tom de Vries ?<tom@codesourcery.com>
>
> ? ? ? ?PR tree-optimization/51879
> ? ? ? ?* tree-ssa-sccvn.h (struct vn_reference_s): Add vdef field.
> ? ? ? ?* tree-ssa-sccvn.c (visit_reference_op_call): Handle gimple_vdef.
> ? ? ? ?Handle case that lhs is NULL_TREE.
> ? ? ? ?(visit_use): Handle calls with side-effect using
> ? ? ? ?visit_reference_op_call.
>
> ? ? ? ?gcc.dg/pr51879.c: New test.
> ? ? ? ?gcc.dg/pr51879-2.c: Same.
> ? ? ? ?gcc.dg/pr51879-3.c: Same.
> ? ? ? ?gcc.dg/pr51879-4.c: Same.
> ? ? ? ?gcc.dg/pr51879-6.c: Same.
>
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]