This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix for PR52081 - Missed tail merging with pure calls


On Mon, Feb 13, 2012 at 1:36 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 13/02/12 12:54, Richard Guenther wrote:
>> On Thu, Feb 2, 2012 at 11:44 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>> Richard,
>>>
>>> this patch fixes PR52801.
>>>
>>> Consider test-case pr51879-12.c:
>>> ...
>>> __attribute__((pure)) int bar (int);
>>> __attribute__((pure)) int bar2 (int);
>>> void baz (int);
>>>
>>> int x, z;
>>>
>>> void
>>> foo (int y)
>>> {
>>> ?int a = 0;
>>> ?if (y == 6)
>>> ? ?{
>>> ? ? ?a += bar (7);
>>> ? ? ?a += bar2 (6);
>>> ? ?}
>>> ?else
>>> ? ?{
>>> ? ? ?a += bar2 (6);
>>> ? ? ?a += bar (7);
>>> ? ?}
>>> ?baz (a);
>>> }
>>> ...
>>>
>>> When compiling at -O2, pr51879-12.c.094t.pre looks like this:
>>> ...
>>> ?# BLOCK 3 freq:1991
>>> ?# PRED: 2 [19.9%] ?(true,exec)
>>> ?# VUSE <.MEMD.1722_12(D)>
>>> ?# USE = nonlocal escaped
>>> ?D.1717_4 = barD.1703 (7);
>>> ?# VUSE <.MEMD.1722_12(D)>
>>> ?# USE = nonlocal escaped
>>> ?D.1718_6 = bar2D.1705 (6);
>>> ?aD.1713_7 = D.1717_4 + D.1718_6;
>>> ?goto <bb 5>;
>>> ?# SUCC: 5 [100.0%] ?(fallthru,exec)
>>>
>>> ?# BLOCK 4 freq:8009
>>> ?# PRED: 2 [80.1%] ?(false,exec)
>>> ?# VUSE <.MEMD.1722_12(D)>
>>> ?# USE = nonlocal escaped
>>> ?D.1720_8 = bar2D.1705 (6);
>>> ?# VUSE <.MEMD.1722_12(D)>
>>> ?# USE = nonlocal escaped
>>> ?D.1721_10 = barD.1703 (7);
>>> ?aD.1713_11 = D.1720_8 + D.1721_10;
>>> ?# SUCC: 5 [100.0%] ?(fallthru,exec)
>>>
>>> ?# BLOCK 5 freq:10000
>>> ?# PRED: 3 [100.0%] ?(fallthru,exec) 4 [100.0%] ?(fallthru,exec)
>>> ?# aD.1713_1 = PHI <aD.1713_7(3), aD.1713_11(4)>
>>> ?# .MEMD.1722_13 = VDEF <.MEMD.1722_12(D)>
>>> ?# USE = nonlocal
>>> ?# CLB = nonlocal
>>> ?bazD.1707 (aD.1713_1);
>>> ?# VUSE <.MEMD.1722_13>
>>> ?return;
>>> ...
>>> block 3 and 4 can be tail-merged.
>>>
>>> Value numbering numbers the two phi arguments a_7 and a_11 the same so the
>>> problem is not in value numbering:
>>> ...
>>> Setting value number of a_11 to a_7 (changed)
>>> ...
>>>
>>> There are 2 reasons that tail_merge_optimize doesn't optimize this:
>>>
>>> 1.
>>> The clause
>>> ?is_gimple_assign (stmt) && local_def (gimple_get_lhs (stmt))
>>> ?&& !gimple_has_side_effects (stmt)
>>> used in both same_succ_hash and gsi_advance_bw_nondebug_nonlocal evaluates to
>>> false for pure calls.
>>> This is fixed by replacing is_gimple_assign with gimple_has_lhs.
>>>
>>> 2.
>>> In same_succ_equal we check gimples from the 2 bbs side-by-side:
>>> ...
>>> ?gsi1 = gsi_start_nondebug_bb (bb1);
>>> ?gsi2 = gsi_start_nondebug_bb (bb2);
>>> ?while (!(gsi_end_p (gsi1) || gsi_end_p (gsi2)))
>>> ? ?{
>>> ? ? ?s1 = gsi_stmt (gsi1);
>>> ? ? ?s2 = gsi_stmt (gsi2);
>>> ? ? ?if (gimple_code (s1) != gimple_code (s2))
>>> ? ? ? ?return 0;
>>> ? ? ?if (is_gimple_call (s1) && !gimple_call_same_target_p (s1, s2))
>>> ? ? ? ?return 0;
>>> ? ? ?gsi_next_nondebug (&gsi1);
>>> ? ? ?gsi_next_nondebug (&gsi2);
>>> ? ?}
>>> ...
>>> and we'll be comparing 'bar (7)' and 'bar2 (6)', and gimple_call_same_target_p
>>> will return false.
>>> This is fixed by ignoring local defs in this comparison, by using
>>> gsi_advance_fw_nondebug_nonlocal on the iterators.
>>>
>>> bootstrapped and reg-tested on x86_64.
>>>
>>> ok for stage1?
>>
>> Sorry for responding so late ...
>
> no problem :)
>
>> I think these fixes hint at that we should
>> use "structural" equality as fallback if value-numbering doesn't equate
>> two stmt effects. ?Thus, treat two stmts with exactly the same operands
>> and flags as equal and using value-numbering to canonicalize operands
>> (when they are SSA names) for that comparison, or use VN entirely
>> if there are no side-effects on the stmt.
>>
>> Changing value-numbering of virtual operands, even if it looks correct in the
>> simple cases you change, doesn't look like a general solution for the missed
>> tail merging opportunities.
>>
>
> Your comment is relevant for the other recent tail-merge related fixes I
> submitted, but I think not for this one.
>
> In this case, value-numbering manages to value number the 2 phi-alternatives
> equal. It's tail-merging that doesn't take advantage of this, by treating pure
> function calls the same as non-pure function calls. The fixes are therefore in
> tail-merging, not in value numbering.
>
> So, ok for stage1?

I see.  A few comments.

+/* Returns whether VAL is used in the same bb as in which it is defined, or
+   in the phi of a successor bb.  */
+
+static bool
+local_def (tree val)
+{
+  gimple stmt, def_stmt;
+  basic_block bb, def_bb;
+  imm_use_iterator iter;
+  bool res;
+
+  if (TREE_CODE (val) != SSA_NAME)
+    return false;

what about SSA_NAME_IS_DEFAULT_DEF names?  They have a def-stmt
with a NULL basic-block.

+  res = true;
+  FOR_EACH_IMM_USE_STMT (stmt, iter, val)
+    {
+      bb = gimple_bb (stmt);
+      if (bb == def_bb)
+       continue;
+      if (gimple_code (stmt) == GIMPLE_PHI
+         && find_edge (def_bb, bb))
+       continue;
+      res = false;
+      BREAK_FROM_IMM_USE_STMT (iter);

I'd use FOR_EACH_IMM_USE_FAST here, that should be faster (avoids
the iterator marker writes), and find_edge can be replaced by
PHI_ARG_INDEX_FROM_USE () - well, you get the edge index that way,
so EDGE_PRED (def_bb, PHI_ARG_INDEX_FROM_USE (use_p)) == bb
would be the condition to test.

local_def seems to be only used from stmt_local_def, consider
inlining it there instead.  Btw, what about other DEFs of a stmt
than the lhs?  I'd say you should use single_ssa_def_operand ()
to get at the DEF, not gimple_get_lhs.

Ok with that changes.

Thanks,
Richard.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]