This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Enhance ifcombine to recover non short circuit branches
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Andrew Pinski <pinskia at gmail dot com>
- Cc: Zhenqiang Chen <zhenqiang dot chen at linaro dot org>, Jeff Law <law at redhat dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 29 Oct 2013 11:28:19 +0100
- Subject: Re: [PATCH] Enhance ifcombine to recover non short circuit branches
- Authentication-results: sourceware.org; auth=none
- References: <CACgzC7AbQLh+uv=vzC51DiHjZZn12yEjvtrcyo61z7KjeYCe7Q at mail dot gmail dot com> <CA+=Sn1mm+3Zv9HxgGt4T_utu+-nsvhhCWRF-aWk05K=_704LRw at mail dot gmail dot com> <CAFiYyc2AgJg2r6mzQ69gYo2chBSRFgdVSkSnFU_tQXEanU+NGQ at mail dot gmail dot com> <52601731 dot 3000301 at redhat dot com> <CACgzC7BjGmkTZbor8Jr8ZpjN08JZUy-V64ExukG6HozK45=krw at mail dot gmail dot com> <CA+=Sn1nTd97wm7aWTmwKskZqzQZqC-a3cxW+p7GUH=CTnYTs-A at mail dot gmail dot com> <CA+=Sn1kZFa=+HRbk1kZZomX_rmujL=n+ZVqxEmtF6qvKpsYHUQ at mail dot gmail dot com> <CA+=Sn1k14xfavt02=1G4RWjdHC6BCvMpG0TrO5FKUEbxs6gkkA at mail dot gmail dot com>
On Sun, Oct 27, 2013 at 7:55 PM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Sat, Oct 26, 2013 at 4:49 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>> On Sat, Oct 26, 2013 at 2:30 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>>> On Fri, Oct 18, 2013 at 2:21 AM, Zhenqiang Chen
>>> <zhenqiang.chen@linaro.org> wrote:
>>>> On 18 October 2013 00:58, Jeff Law <law@redhat.com> wrote:
>>>>> On 10/17/13 05:03, Richard Biener wrote:
>>>>>>>>
>>>>>>>> Is it OK for trunk?
>>>>>>>
>>>>>>>
>>>>>>> I had a much simpler change which did basically the same from 4.7 (I
>>>>>>> can update it if people think this is a better approach).
>>>>>>
>>>>>>
>>>>>> I like that more (note you can now use is_gimple_condexpr as predicate
>>>>>> for force_gimple_operand).
>>>>>
>>>>> The obvious question is whether or not Andrew's simpler change picks up as
>>>>> many transformations as Zhenqiang's change. If not are the things missed
>>>>> important.
>>>>>
>>>>> Zhenqiang, can you do some testing of your change vs Andrew P.'s change?
>>>>
>>>> Here is a rough compare:
>>>>
>>>> 1) Andrew P.'s change can not handle ssa-ifcombine-ccmp-3.c (included
>>>> in my patch). Root cause is that it does not skip "LABEL". The guard
>>>> to do this opt should be the same the bb_has_overhead_p in my patch.
>>>
>>> This should be an easy change, I am working on this right now.
>>>
>>>>
>>>> 2) Andrew P.'s change always generate TRUTH_AND_EXPR, which is not
>>>> efficient for "||". e.g. For ssa-ifcombine-ccmp-6.c, it will generate
>>>>
>>>> _3 = a_2(D) > 0;
>>>> _5 = b_4(D) > 0;
>>>> _6 = _3 | _5;
>>>> _9 = c_7(D) <= 0;
>>>> _10 = ~_6;
>>>> _11 = _9 & _10;
>>>> if (_11 == 0)
>>>>
>>>> With my patch, it will generate
>>>>
>>>> _3 = a_2(D) > 0;
>>>> _5 = b_4(D) > 0;
>>>> _6 = _3 | _5;
>>>> _9 = c_7(D) > 0;
>>>> _10 = _6 | _9;
>>>> if (_10 != 0)
>>>
>>> As mentioned otherwise, this seems like a missed optimization inside
>>> forwprop. When I originally wrote this code there used to be two
>>> cases one for & and one for |, but this was removed sometime and I
>>> just made the code evolve with that.
>>
>> Actually I can make a small patch (3 lines) to my current patch which
>> causes tree-ssa-ifcombine.c to produce the behavior of your patch.
>>
>> if (result_inv)
>> {
>> t = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (t), t);
>> result_inv = false;
>> }
>>
>> I will submit a new patch after some testing of my current patch which
>> fixes items 1 and 2.
>
>
> Here is my latest patch which adds the testcases from Zhenqiang's
> patch and fixes item 1 and 2.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
return i;
}
+/* Return a new iterator pointing to the first non-debug non-label statement in
+ basic block BB. */
+
+static inline gimple_stmt_iterator
+gsi_start_nondebug_after_labels_bb (basic_block bb)
vertical space before the comment missing.
@@ -631,7 +669,7 @@ tree_ssa_ifcombine (void)
bbs = single_pred_before_succ_order ();
calculate_dominance_info (CDI_DOMINATORS);
- for (i = 0; i < n_basic_blocks - NUM_FIXED_BLOCKS; ++i)
+ for (i = n_basic_blocks - NUM_FIXED_BLOCKS -1; i >= 0; i--)
{
basic_block bb = bbs[i];
gimple stmt = last_stmt (bb);
The original code matches what phiopt does which even has a comment:
/* Search every basic block for COND_EXPR we may be able to optimize.
We walk the blocks in order that guarantees that a block with
a single predecessor is processed before the predecessor.
This ensures that we collapse inner ifs before visiting the
outer ones, and also that we do not try to visit a removed
block. */
bb_order = single_pred_before_succ_order ();
n = n_basic_blocks - NUM_FIXED_BLOCKS;
for (i = 0; i < n; i++)
so if the reverse order is better here (and in phiopt?) then it at least
needs a comment explaining why.
Otherwise the patch looks good, but please iterate with Jeff over
the dom testcase issue.
Thanks,
Richard.
> Thanks,
> Andrew Pinski
>
> ChangeLog:
> * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h.
> (ifcombine_ifandif): Handle cases where
> maybe_fold_and_comparisons fails, combining the branches
> anyways.
> (tree_ssa_ifcombine): Inverse the order of
> the basic block walk, increases the number of combinings.
> * gimple.h (gsi_start_nondebug_after_labels_bb): New function.
>
>
> testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c: New test case.
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c: New test case.
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c: New test case.
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c: New test case.
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c: New test case.
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c: New test case.
>
> * gcc.dg/tree-ssa/phi-opt-9.c: Use a function call to prevent
> conditional move to be used.
> * gcc.dg/tree-ssa/ssa-dom-thread-3.c: Remove check for "one or more
> intermediate".
>
>
>
>
>>
>> Thanks,
>> Andrew Pinski
>>
>>
>>>
>>>>
>>>> 3) The good thing of Andrew P.'s change is that "Inverse the order of
>>>> the basic block walk" so it can do combine recursively.
>>>>
>>>> But I think we need some heuristic to control the number of ifs. Move
>>>> too much compares from
>>>> the inner_bb to outer_bb is not good.
>>>
>>>
>>> I think this depends on the target. For MIPS we don't want an upper
>>> bound as integer comparisons go directly to GPRs. I wrote this patch
>>> with MIPS in mind as that was the target I was working on.
>>>
>>>>
>>>> 4) Another good thing of Andrew P.'s change is that it reuses some
>>>> existing functions. So it looks much simple.
>>>
>>>
>>> Thanks,
>>> Andrew Pinski
>>>
>>>>
>>>>>>
>>>>>> With that we should be able to kill the fold-const.c transform?
>>>>>
>>>>> That would certainly be nice and an excellent follow-up for Zhenqiang.
>>>>
>>>> That's my final goal to "kill the fold-const.c transform". I think we
>>>> may combine the two changes to make a "simple" and "good" patch.
>>>>
>>>> Thanks!
>>>> -Zhenqiang