[stage1][PATCH] Lower VEC_COND_EXPR into internal functions.

Martin Liška mliska@suse.cz
Thu Jun 18 09:02:52 GMT 2020


On 6/18/20 10:52 AM, Richard Biener wrote:
> On Thu, Jun 18, 2020 at 10:10 AM Martin Liška <mliska@suse.cz> wrote:
>>
>> On 6/17/20 3:15 PM, Richard Biener wrote:
>>> On Wed, Jun 17, 2020 at 10:50 AM Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>>
>>>> On Mon, Jun 15, 2020 at 2:20 PM Martin Liška <mliska@suse.cz> wrote:
>>>>>
>>>>> On 6/15/20 1:59 PM, Richard Biener wrote:
>>>>>> On Mon, Jun 15, 2020 at 1:19 PM Martin Liška <mliska@suse.cz> wrote:
>>>>>>>
>>>>>>> On 6/15/20 9:14 AM, Richard Biener wrote:
>>>>>>>> On Fri, Jun 12, 2020 at 3:24 PM Martin Liška <mliska@suse.cz> wrote:
>>>>>>>>>
>>>>>>>>> On 6/12/20 11:43 AM, Richard Biener wrote:
>>>>>>>>>> So ... how far are you with enforcing a split VEC_COND_EXPR?
>>>>>>>>>> Thus can we avoid the above completely (even as intermediate
>>>>>>>>>> state)?
>>>>>>>>>
>>>>>>>>> Apparently, I'm quite close. Using the attached patch I see only 2 testsuite
>>>>>>>>> failures:
>>>>>>>>>
>>>>>>>>> FAIL: gcc.dg/tree-ssa/pr68714.c scan-tree-dump-times reassoc1 " <= " 1
>>>>>>>>> FAIL: gcc.target/i386/pr78102.c scan-assembler-times pcmpeqq 3
>>>>>>>>>
>>>>>>>>> The first one is about teaching reassoc about the SSA_NAMEs in VEC_COND_EXPR. I haven't
>>>>>>>>> analyze the second failure.
>>>>>>>>>
>>>>>>>>> I'm also not sure about the gimlification change, I see a superfluous assignments:
>>>>>>>>>        vec_cond_cmp.5 = _1 == _2;
>>>>>>>>>        vec_cond_cmp.6 = vec_cond_cmp.5;
>>>>>>>>>        vec_cond_cmp.7 = vec_cond_cmp.6;
>>>>>>>>>        _3 = VEC_COND_EXPR <vec_cond_cmp.7, { -1, -1, -1, -1, -1, -1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0 }>;
>>>>>>>>> ?
>>>>>>>>>
>>>>>>>>> So with the suggested patch, the EH should be gone as you suggested. Right?
>>>>>>>>
>>>>>>>> Right, it should be on the comparison already from the start.
>>>>>>>>
>>>>>>>> @@ -14221,9 +14221,13 @@ gimplify_expr (tree *expr_p, gimple_seq
>>>>>>>> *pre_p, gimple_seq *post_p,
>>>>>>>>             case VEC_COND_EXPR:
>>>>>>>>               {
>>>>>>>>                 enum gimplify_status r0, r1, r2;
>>>>>>>> -
>>>>>>>>                 r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
>>>>>>>>                                     post_p, is_gimple_condexpr, fb_rvalue);
>>>>>>>> +           tree xop0 = TREE_OPERAND (*expr_p, 0);
>>>>>>>> +           tmp = create_tmp_var_raw (TREE_TYPE (xop0), "vec_cond_cmp");
>>>>>>>> +           gimple_add_tmp_var (tmp);
>>>>>>>> +           gimplify_assign (tmp, xop0, pre_p);
>>>>>>>> +           TREE_OPERAND (*expr_p, 0) = tmp;
>>>>>>>>                 r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p,
>>>>>>>>                                     post_p, is_gimple_val, fb_rvalue);
>>>>>>>>
>>>>>>>> all of VEC_COND_EXPR can now be a simple goto expr_3;
>>>>>>>
>>>>>>> Works for me, thanks!
>>>>>>>
>>>>>>>>
>>>>>>>> diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
>>>>>>>> index 494c9e9c20b..090fb52a2f1 100644
>>>>>>>> --- a/gcc/tree-ssa-forwprop.c
>>>>>>>> +++ b/gcc/tree-ssa-forwprop.c
>>>>>>>> @@ -3136,6 +3136,10 @@ pass_forwprop::execute (function *fun)
>>>>>>>>                         if (code == COND_EXPR
>>>>>>>>                             || code == VEC_COND_EXPR)
>>>>>>>>                           {
>>>>>>>> +                       /* Do not propagate into VEC_COND_EXPRs.  */
>>>>>>>> +                       if (code == VEC_COND_EXPR)
>>>>>>>> +                         break;
>>>>>>>> +
>>>>>>>>
>>>>>>>> err - remove the || code == VEC_COND_EXPR instead?
>>>>>>>
>>>>>>> Yep.
>>>>>>>
>>>>>>>>
>>>>>>>> @@ -2221,24 +2226,12 @@ expand_vector_operations (void)
>>>>>>>>      {
>>>>>>>>        gimple_stmt_iterator gsi;
>>>>>>>>        basic_block bb;
>>>>>>>> -  bool cfg_changed = false;
>>>>>>>>
>>>>>>>>        FOR_EACH_BB_FN (bb, cfun)
>>>>>>>> -    {
>>>>>>>> -      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>>>>>>>> -       {
>>>>>>>> -         expand_vector_operations_1 (&gsi);
>>>>>>>> -         /* ???  If we do not cleanup EH then we will ICE in
>>>>>>>> -            verification.  But in reality we have created wrong-code
>>>>>>>> -            as we did not properly transition EH info and edges to
>>>>>>>> -            the piecewise computations.  */
>>>>>>>> -         if (maybe_clean_eh_stmt (gsi_stmt (gsi))
>>>>>>>> -             && gimple_purge_dead_eh_edges (bb))
>>>>>>>> -           cfg_changed = true;
>>>>>>>> -       }
>>>>>>>> -    }
>>>>>>>>
>>>>>>>> I'm not sure about this.  Consider the C++ testcase where
>>>>>>>> the ?: is replaced by a division.  If veclower needs to replace
>>>>>>>> that with four scalrar division statements then the above
>>>>>>>> still applies - veclower does not correctly duplicate EH info
>>>>>>>> and EH edges to the individual divisions (and we do not know
>>>>>>>> which component might trap).
>>>>>>>>
>>>>>>>> So please leave the above in.  You can try if using integer
>>>>>>>> division makes it break and add such a testcase if there's
>>>>>>>> no coverage for this in the testsuite.
>>>>>>>
>>>>>>> I'm leaving that above. Can you please explain how can a division test-case
>>>>>>> be created?
>>>>>>
>>>>>> typedef long v2di __attribute__((vector_size(16)));
>>>>>>
>>>>>> v2di foo (v2di a, v2di b)
>>>>>> {
>>>>>>      try
>>>>>>      {
>>>>>>        v2di res = a / b;
>>>>>>        return res;
>>>>>>        }
>>>>>>        catch (...)
>>>>>>        {
>>>>>>        return (v2di){};
>>>>>>        }
>>>>>> }
>>>>>>
>>>>>> with -fnon-call-exceptions I see in t.ii.090t.ehdisp (correctly):
>>>>>>
>>>>>> ;;   basic block 2, loop depth 0
>>>>>> ;;    pred:       ENTRY
>>>>>>      [LP 1] _6 = a_4(D) / b_5(D);
>>>>>> ;;    succ:       5
>>>>>> ;;                3
>>>>>>
>>>>>> while after t.ii.226t.veclower we have
>>>>>>
>>>>>> ;;   basic block 2, loop depth 0
>>>>>> ;;    pred:       ENTRY
>>>>>>      _13 = BIT_FIELD_REF <a_4(D), 64, 0>;
>>>>>>      _14 = BIT_FIELD_REF <b_5(D), 64, 0>;
>>>>>>      _15 = _13 / _14;
>>>>>>      _16 = BIT_FIELD_REF <a_4(D), 64, 64>;
>>>>>>      _17 = BIT_FIELD_REF <b_5(D), 64, 64>;
>>>>>>      _18 = _16 / _17;
>>>>>>      _6 = {_15, _18};
>>>>>>      res_7 = _6;
>>>>>>      _8 = res_7;
>>>>>> ;;    succ:       3
>>>>>>
>>>>>> and all EH is gone and we'd ICE if you remove the above hunk.  Hopefully.
>>>>>
>>>>> Yes, it ICEs then:
>>>>>
>>>>>
>>>>> ./xg++ -B. ~/Programming/testcases/ice.c -c -fnon-call-exceptions -O3
>>>>> /home/marxin/Programming/testcases/ice.c: In function ‘v2di foo(v2di, v2di)’:
>>>>> /home/marxin/Programming/testcases/ice.c:3:6: error: statement marked for throw, but doesn’t
>>>>>        3 | v2di foo (v2di a, v2di b)
>>>>>          |      ^~~
>>>>> _6 = {_12, _15};
>>>>> during GIMPLE pass: veclower2
>>>>> /home/marxin/Programming/testcases/ice.c:3:6: internal compiler error: verify_gimple failed
>>>>> 0x10e308a verify_gimple_in_cfg(function*, bool)
>>>>>           /home/marxin/Programming/gcc/gcc/tree-cfg.c:5461
>>>>> 0xfc9caf execute_function_todo
>>>>>           /home/marxin/Programming/gcc/gcc/passes.c:1985
>>>>> 0xfcaafc do_per_function
>>>>>           /home/marxin/Programming/gcc/gcc/passes.c:1640
>>>>> 0xfcaafc execute_todo
>>>>>           /home/marxin/Programming/gcc/gcc/passes.c:2039
>>>>> Please submit a full bug report,
>>>>> with preprocessed source if appropriate.
>>>>> Please include the complete backtrace with any bug report.
>>>>> See <https://gcc.gnu.org/bugs/> for instructions.
>>>>>
>>>>>>
>>>>>> We still generate wrong-code obviously as we'd need to duplicate the
>>>>>> EH info on each component division (and split blocks and generate
>>>>>> extra EH edges).  That's a pre-existing bug of course.  I just wanted
>>>>>> to avoid to create a new instance just because of the early instruction
>>>>>> selection for VEC_COND_EXPR.
>>>>>
>>>>> Fine!
>>>>>
>>>>>>
>>>>>>>>
>>>>>>>> What's missing from the patch is adjusting
>>>>>>>> verify_gimple_assign_ternary from
>>>>>>>>
>>>>>>>>       if (((rhs_code == VEC_COND_EXPR || rhs_code == COND_EXPR)
>>>>>>>>            ? !is_gimple_condexpr (rhs1) : !is_gimple_val (rhs1))
>>>>>>>>           || !is_gimple_val (rhs2)
>>>>>>>>           || !is_gimple_val (rhs3))
>>>>>>>>         {
>>>>>>>>           error ("invalid operands in ternary operation");
>>>>>>>>           return true;
>>>>>>>>
>>>>>>>> to the same with the rhs_code == VEC_COND_EXPR case removed.
>>>>>>>
>>>>>>> Hmm. I'm not sure I've got this comment. Why do we want to change it
>>>>>>> and is it done wright in the patch?
>>>>>>
>>>>>> Ah, I missed the hunk you added.
>>>>>
>>>>> That explains the confusion I got.
>>>>>
>>>>>>    But the check should be an inclusive
>>>>>> one, not an exclusive one and earlier accepting a is_gimple_condexpr
>>>>>> is superfluous when you later reject the tcc_comparison part.  Just
>>>>>> testing is_gimple_val is better.  So yes, remove your tree-cfg.c hunk
>>>>>> and just adjust the above test.
>>>>>
>>>>> I simplified that.
>>>>>
>>>>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>>>>
>>>> Please double-check the changelog
>>>>
>>>>           (do_store_flag):
>>>>
>>>> +       tree-vect-isel.o \
>>>>
>>>> IMHO we want to move more of the pattern matching magic of RTL
>>>> expansion here to obsolete TER.  So please name it gimple-isel.cc
>>>> (.cc!, not .c)
>>>>
>>>> +  gassign *assign = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (cond));
>>>> +  if (stmt != NULL
>>>> +      && TREE_CODE_CLASS (gimple_assign_rhs_code (assign)) != tcc_comparison)
>>>> +    return ERROR_MARK;
>>>>
>>>> you want stmt == NULL || TREE_CODE_CLASS (...)
>>>>
>>>> in case the def stmt is a call.
>>>>
>>>> +         gimple_seq seq;
>>>> +         tree exp = force_gimple_operand (comb, &seq, true, NULL_TREE);
>>>> +         if (seq)
>>>> +           {
>>>> +             gimple_stmt_iterator gsi = gsi_for_stmt (vcond0);
>>>> +             gsi_insert_before (&gsi, seq, GSI_SAME_STMT);
>>>> +           }
>>>>
>>>> use force_gimple_operand_gsi that makes the above simpler.
>>>>
>>>>             if (invert)
>>>> -           std::swap (*gimple_assign_rhs2_ptr (stmt0),
>>>> -                      *gimple_assign_rhs3_ptr (stmt0));
>>>> -         update_stmt (stmt0);
>>>> +           std::swap (*gimple_assign_rhs2_ptr (vcond0),
>>>> +                      *gimple_assign_rhs3_ptr (vcond0));
>>>>
>>>> use swap_ssa_operands.
>>>>
>>>> +         gimple_assign_set_rhs1 (vcond0, exp);
>>>> +         update_stmt (vcond0);
>>>>
>>>> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
>>>> index cf2d979fea1..710b17a7c5c 100644
>>>> --- a/gcc/tree-vect-stmts.c
>>>> +++ b/gcc/tree-vect-stmts.c
>>>> @@ -9937,8 +9937,8 @@ vectorizable_condition (vec_info *vinfo,
>>>>           {
>>>>             vec_cond_rhs = vec_oprnds1[i];
>>>>             if (bitop1 == NOP_EXPR)
>>>> -           vec_compare = build2 (cond_code, vec_cmp_type,
>>>> -                                 vec_cond_lhs, vec_cond_rhs);
>>>> +           vec_compare = gimplify_build2 (gsi, cond_code, vec_cmp_type,
>>>> +                                          vec_cond_lhs, vec_cond_rhs);
>>>>             else
>>>>               {
>>>>
>>>> please don't introduce more uses of gimplify_buildN - I'd like to
>>>> get rid of those.  You can use
>>>>
>>>>        gimple_seq stmts = NULL;
>>>>        vec_compare = gimple_build (&stmts, cond_code, ...);
>>>>        gsi_insert_seq_before/after (...);
>>>>
>>>> OK with those changes.
>>>
>>> Applying the patch caused
>>>
>>> Running target unix//-m32
>>> FAIL: gcc.c-torture/execute/ieee/pr50310.c execution,  -O3
>>> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer
>>> -finline-functions
>>> FAIL: gcc.c-torture/execute/ieee/pr50310.c execution,  -O3 -g
>>
>> I can't reproduce that with current master. Can you?
> 
> Yes.
> 
>> make check-gcc RUNTESTFLAGS="--target_board=unix/-m32 ieee.exp=pr50310.c"
> ...
> FAIL: gcc.c-torture/execute/ieee/pr50310.c execution,  -O3
> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer
> -finline-functions
> FAIL: gcc.c-torture/execute/ieee/pr50310.c execution,  -O3 -g

Now I've got it.

> 
> mind the -m32

I did, but -ffloat-store was not mentioned in the previous list of options ;)

Martin

> 
>>>
>>> and
>>>
>>> FAIL: ext/random/simd_fast_mersenne_twister_engine/operators/inequal.cc
>>> (test for excess errors)
>>> UNRESOLVED: ext/random/simd_fast_mersenne_twister_engine/operators/inequal.cc
>>> compilation failed to produce executable
>>
>> I've just fixed this one.
>>
>> Martin
>>
>>>
>>> Richard.
>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>
>>>>> Thanks,
>>>>> Martin
>>>>>
>>>>>>
>>>>>>>>
>>>>>>>> You'll likely figure the vectorizer still creates some VEC_COND_EXPRs
>>>>>>>> with embedded comparisons.
>>>>>>>
>>>>>>> I've fixed 2 failing test-cases I mentioned in the previous email.
>>>>>>>
>>>>>>> Martin
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>
>>>>>>>>> Martin
>>>>>>>
>>>>>
>>



More information about the Gcc-patches mailing list