This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Enhance ifcombine to recover non short circuit branches


On Tue, Oct 29, 2013 at 3:28 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Sun, Oct 27, 2013 at 7:55 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>> On Sat, Oct 26, 2013 at 4:49 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>>> On Sat, Oct 26, 2013 at 2:30 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>>>> On Fri, Oct 18, 2013 at 2:21 AM, Zhenqiang Chen
>>>> <zhenqiang.chen@linaro.org> wrote:
>>>>> On 18 October 2013 00:58, Jeff Law <law@redhat.com> wrote:
>>>>>> On 10/17/13 05:03, Richard Biener wrote:
>>>>>>>>>
>>>>>>>>> Is it OK for trunk?
>>>>>>>>
>>>>>>>>
>>>>>>>> I had a much simpler change which did basically the same from 4.7 (I
>>>>>>>> can update it if people think this is a better approach).
>>>>>>>
>>>>>>>
>>>>>>> I like that more (note you can now use is_gimple_condexpr as predicate
>>>>>>> for force_gimple_operand).
>>>>>>
>>>>>> The obvious question is whether or not Andrew's simpler change picks up as
>>>>>> many transformations as Zhenqiang's change.  If not are the things missed
>>>>>> important.
>>>>>>
>>>>>> Zhenqiang, can you do some testing of your change vs Andrew P.'s change?
>>>>>
>>>>> Here is a rough compare:
>>>>>
>>>>> 1) Andrew P.'s change can not handle ssa-ifcombine-ccmp-3.c (included
>>>>> in my patch). Root cause is that it does not skip "LABEL". The guard
>>>>> to do this opt should be the same the bb_has_overhead_p in my patch.
>>>>
>>>> This should be an easy change, I am working on this right now.
>>>>
>>>>>
>>>>> 2) Andrew P.'s change always generate TRUTH_AND_EXPR, which is not
>>>>> efficient for "||". e.g. For ssa-ifcombine-ccmp-6.c, it will generate
>>>>>
>>>>>   _3 = a_2(D) > 0;
>>>>>   _5 = b_4(D) > 0;
>>>>>   _6 = _3 | _5;
>>>>>   _9 = c_7(D) <= 0;
>>>>>   _10 = ~_6;
>>>>>   _11 = _9 & _10;
>>>>>   if (_11 == 0)
>>>>>
>>>>> With my patch, it will generate
>>>>>
>>>>>   _3 = a_2(D) > 0;
>>>>>   _5 = b_4(D) > 0;
>>>>>   _6 = _3 | _5;
>>>>>   _9 = c_7(D) > 0;
>>>>>   _10 = _6 | _9;
>>>>>   if (_10 != 0)
>>>>
>>>> As mentioned otherwise, this seems like a missed optimization inside
>>>> forwprop.  When I originally wrote this code there used to be two
>>>> cases one for & and one for |, but this was removed sometime and I
>>>> just made the code evolve with that.
>>>
>>> Actually I can make a small patch (3 lines) to my current patch which
>>> causes tree-ssa-ifcombine.c to produce the behavior of your patch.
>>>
>>>  if (result_inv)
>>>    {
>>>      t = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (t), t);
>>>      result_inv = false;
>>>    }
>>>
>>> I will submit a new patch after some testing of my current patch which
>>> fixes items 1 and 2.
>>
>>
>> Here is my latest patch which adds the testcases from Zhenqiang's
>> patch and fixes item 1 and 2.
>>
>> OK?  Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
>    return i;
>  }
> +/* Return a new iterator pointing to the first non-debug non-label statement in
> +   basic block BB.  */
> +
> +static inline gimple_stmt_iterator
> +gsi_start_nondebug_after_labels_bb (basic_block bb)
>
>
> vertical space before the comment missing.
>
> @@ -631,7 +669,7 @@ tree_ssa_ifcombine (void)
>    bbs = single_pred_before_succ_order ();
>    calculate_dominance_info (CDI_DOMINATORS);
>
> -  for (i = 0; i < n_basic_blocks - NUM_FIXED_BLOCKS; ++i)
> +  for (i = n_basic_blocks - NUM_FIXED_BLOCKS -1; i >= 0; i--)
>      {
>        basic_block bb = bbs[i];
>        gimple stmt = last_stmt (bb);
>
> The original code matches what phiopt does which even has a comment:
>
>   /* Search every basic block for COND_EXPR we may be able to optimize.
>
>      We walk the blocks in order that guarantees that a block with
>      a single predecessor is processed before the predecessor.
>      This ensures that we collapse inner ifs before visiting the
>      outer ones, and also that we do not try to visit a removed
>      block.  */
>   bb_order = single_pred_before_succ_order ();
>   n = n_basic_blocks - NUM_FIXED_BLOCKS;
>
>   for (i = 0; i < n; i++)
>
> so if the reverse order is better here (and in phiopt?) then it at least
> needs a comment explaining why.
>
> Otherwise the patch looks good, but please iterate with Jeff over
> the dom testcase issue.


Here is what I applied in the end; Jeff told me just to remove the
testcase.  I added the comment trying to explain why it was the
opposite order of PHI-opt.

Thanks,
Andrew Pinski

ChangeLog:
* tree-ssa-ifcombine.c: Include rtl.h and tm_p.h.
(ifcombine_ifandif): Handle cases where
maybe_fold_and_comparisons fails, combining the branches
anyways.
(tree_ssa_ifcombine): Inverse the order of
the basic block walk, increases the number of combinings.
* gimple.h (gsi_start_nondebug_after_labels_bb): New function.

testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c: New test case.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c: New test case.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c: New test case.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c: New test case.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c: New test case.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c: New test case.

* gcc.dg/tree-ssa/phi-opt-9.c: Use a function call to prevent
conditional move to be used.
* gcc.dg/tree-ssa/ssa-dom-thread-3.c: Remove.


>
> Thanks,
> Richard.
>
>> Thanks,
>> Andrew Pinski
>>
>> ChangeLog:
>> * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h.
>> (ifcombine_ifandif): Handle cases where
>> maybe_fold_and_comparisons fails, combining the branches
>> anyways.
>> (tree_ssa_ifcombine): Inverse the order of
>> the basic block walk, increases the number of combinings.
>> * gimple.h (gsi_start_nondebug_after_labels_bb): New function.
>>
>>
>> testsuite/ChangeLog:
>>
>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c: New test case.
>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c: New test case.
>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c: New test case.
>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c: New test case.
>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c: New test case.
>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c: New test case.
>>
>> * gcc.dg/tree-ssa/phi-opt-9.c: Use a function call to prevent
>> conditional move to be used.
>> * gcc.dg/tree-ssa/ssa-dom-thread-3.c: Remove check for "one or more
>> intermediate".
>>
>>
>>
>>
>>>
>>> Thanks,
>>> Andrew Pinski
>>>
>>>
>>>>
>>>>>
>>>>> 3) The good thing of Andrew P.'s change is that "Inverse the order of
>>>>> the basic block walk" so it can do combine recursively.
>>>>>
>>>>> But I think we need some heuristic to control the number of ifs. Move
>>>>> too much compares from
>>>>> the inner_bb to outer_bb is not good.
>>>>
>>>>
>>>> I think this depends on the target.  For MIPS we don't want an upper
>>>> bound as integer comparisons go directly to GPRs.  I wrote this patch
>>>> with MIPS in mind as that was the target I was working on.
>>>>
>>>>>
>>>>> 4) Another good thing of Andrew P.'s change is that it reuses some
>>>>> existing functions. So it looks much simple.
>>>>
>>>>
>>>> Thanks,
>>>> Andrew Pinski
>>>>
>>>>>
>>>>>>>
>>>>>>> With that we should be able to kill the fold-const.c transform?
>>>>>>
>>>>>> That would certainly be nice and an excellent follow-up for Zhenqiang.
>>>>>
>>>>> That's my final goal to "kill the fold-const.c transform". I think we
>>>>> may combine the two changes to make a "simple" and "good" patch.
>>>>>
>>>>> Thanks!
>>>>> -Zhenqiang

Attachment: ssa_ifcombine.patch.txt
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]