This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [PATCH PR68542]

From: Richard Biener <richard dot guenther at gmail dot com>
To: Yuri Rumyantsev <ysrumyan at gmail dot com>
Cc: gcc-patches <gcc-patches at gcc dot gnu dot org>, Igor Zamyatin <izamyatin at gmail dot com>, Kirill Yukhin <kirill dot Yukhin at gmail dot com>
Date: Wed, 16 Dec 2015 14:37:02 +0100
Subject: Re: [PATCH PR68542]
Authentication-results: sourceware.org; auth=none
References: <CAEoMCqQT9xxV-1sZPEQPfbuVrTvCVsCyWc4pEbiuph_tXMMqFw at mail dot gmail dot com> <CAFiYyc1BRb0-u5mmzT-M6PJk1JUVKLxJHMCJYoxi3f9ABjNhBw at mail dot gmail dot com> <CAEoMCqTV297wcT9D=0M0oG4MryMoG9iU9563BRA=9LHza46xMA at mail dot gmail dot com> <CAFiYyc0Kip21M=rLOZZ2=wAMdXCKAsw1KVJQEW5JZ+-CS4BXUw at mail dot gmail dot com> <CAEoMCqQ9mJUvVsiGLE-ay0iS17Qn-BP2BX8DSLfzAwUqW_Jgtg at mail dot gmail dot com>
On Fri, Dec 11, 2015 at 3:03 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard.
> Thanks for your review.
> I re-designed fix for assert by adding additional checks for vector
> comparison with boolean result to fold_binary_op_with_conditional_arg
> and remove early exit to combine_cond_expr_cond.
> Unfortunately, I am not able to provide you with test-case since it is
> in my second patch related to back-end patch which I sent earlier
> (12-08).
>
> Bootstrapping and regression testing did not show any new failures.
> Is it OK for trunk?

+  else if (TREE_CODE (type) == VECTOR_TYPE)
     {
       tree testtype = TREE_TYPE (cond);
       test = cond;
       true_value = constant_boolean_node (true, testtype);
       false_value = constant_boolean_node (false, testtype);
     }
+  else
+    {
+      test = cond;
+      cond_type = type;
+      true_value = boolean_true_node;
+      false_value = boolean_false_node;
+    }

So this is, say, vec1 != vec2 with scalar vs. vector result.  If we have
scalar result and thus, say, scalar + vec1 != vec2.  I believe rather
than doing the above (not seeing how this not would generate wrong
code eventually) we should simply detect the case of mixing vector
and scalar types and bail out.  At least without some comments
your patch makes the function even more difficult to understand than
it is already.

@@ -3448,10 +3448,17 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
       if (TREE_CODE (op0_type) == VECTOR_TYPE
          || TREE_CODE (op1_type) == VECTOR_TYPE)
         {
-          error ("vector comparison returning a boolean");
-          debug_generic_expr (op0_type);
-          debug_generic_expr (op1_type);
-          return true;
+         /* Allow vector comparison returning boolean if operand types
+            are boolean or integral and CODE is EQ/NE.  */
+         if (code != EQ_EXPR && code != NE_EXPR
+             && !VECTOR_BOOLEAN_TYPE_P (op0_type)
+             && !VECTOR_INTEGER_TYPE_P (op0_type))
+           {
+             error ("type mismatch for vector comparison returning a boolean");
+             debug_generic_expr (op0_type);
+             debug_generic_expr (op1_type);
+             return true;
+           }
         }
     }
   /* Or a boolean vector type with the same element count

as said before please merge the cascaded if()s.  Better wording for
the error is "unsupported operation or type for vector comparison
returning a boolean"

Otherwise the patch looks sensible to me though it shows that overloading of
EQ/NE_EXPR for scalar result and vector operands might have some more unexpected
fallout (which is why I originally prefered the view-convert to large
integer type variant).

Thanks,
Richard.


> ChangeLog:
> 2015-12-11  Yuri Rumyantsev  <ysrumyan@gmail.com>
>
> PR middle-end/68542
> * fold-const.c (fold_binary_op_with_conditional_arg): Add checks oh
> vector comparison with boolean result to avoid ICE.
> (fold_relational_const): Add handling of vector
> comparison with boolean result.
> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
> comparison of vector operands with boolean result for EQ/NE only.
> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
> (verify_gimple_cond): Likewise.
> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
> combining for non-compatible vector types.
> * tree-vrp.c (register_edge_assert_for): VRP does not track ranges for
> vector types.
>
> 2015-12-10 16:36 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Fri, Dec 4, 2015 at 4:07 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Hi Richard.
>>>
>>> Thanks a lot for your review.
>>> Below are my answers.
>>>
>>> You asked why I inserted additional check to
>>> ++ b/gcc/tree-ssa-forwprop.c
>>> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
>>> tree_code code, tree type,
>>>
>>>    gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
>>>
>>> +  /* Do not perform combining it types are not compatible.  */
>>> +  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
>>> +      && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0))))
>>> +    return NULL_TREE;
>>> +
>>>
>>> again, how does this happen?
>>>
>>> This is because without it I've got assert in fold_convert_loc
>>>       gcc_assert (TREE_CODE (orig) == VECTOR_TYPE
>>>  && tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (orig)));
>>>
>>> since it tries to convert vector of bool to scalar bool.
>>> Here is essential part of call-stack:
>>>
>>> #0  internal_error (gmsgid=0x1e48397 "in %s, at %s:%d")
>>>     at ../../gcc/diagnostic.c:1259
>>> #1  0x0000000001743ada in fancy_abort (
>>>     file=0x1847fc3 "../../gcc/fold-const.c", line=2217,
>>>     function=0x184b9d0 <fold_convert_loc(unsigned int, tree_node*,
>>> tree_node*)::__FUNCTION__> "fold_convert_loc") at
>>> ../../gcc/diagnostic.c:1332
>>> #2  0x00000000009c8330 in fold_convert_loc (loc=0, type=0x7ffff18a9d20,
>>>     arg=0x7ffff1a7f488) at ../../gcc/fold-const.c:2216
>>> #3  0x00000000009f003f in fold_ternary_loc (loc=0, code=VEC_COND_EXPR,
>>>     type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff18c2000,
>>>     op2=0x7ffff18c2030) at ../../gcc/fold-const.c:11453
>>> #4  0x00000000009f2f94 in fold_build3_stat_loc (loc=0, code=VEC_COND_EXPR,
>>>     type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff18c2000,
>>>     op2=0x7ffff18c2030) at ../../gcc/fold-const.c:12394
>>> #5  0x00000000009d870c in fold_binary_op_with_conditional_arg (loc=0,
>>>     code=EQ_EXPR, type=0x7ffff18a9d20, op0=0x7ffff1a7f460,
>>>     op1=0x7ffff1a48780, cond=0x7ffff1a7f460, arg=0x7ffff1a48780,
>>>     cond_first_p=1) at ../../gcc/fold-const.c:6465
>>> #6  0x00000000009e3407 in fold_binary_loc (loc=0, code=EQ_EXPR,
>>>     type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff1a48780)
>>>     at ../../gcc/fold-const.c:9211
>>> #7  0x0000000000ecb8fa in combine_cond_expr_cond (stmt=0x7ffff1a487d0,
>>>     code=EQ_EXPR, type=0x7ffff18a9d20, op0=0x7ffff1a7f460,
>>>     op1=0x7ffff1a48780, invariant_only=true)
>>>     at ../../gcc/tree-ssa-forwprop.c:382
>>
>> Ok, but that only shows that
>>
>>       /* Convert A ? 1 : 0 to simply A.  */
>>       if ((code == VEC_COND_EXPR ? integer_all_onesp (op1)
>>                                  : (integer_onep (op1)
>>                                     && !VECTOR_TYPE_P (type)))
>>           && integer_zerop (op2)
>>           /* If we try to convert OP0 to our type, the
>>              call to fold will try to move the conversion inside
>>              a COND, which will recurse.  In that case, the COND_EXPR
>>              is probably the best choice, so leave it alone.  */
>>           && type == TREE_TYPE (arg0))
>>         return pedantic_non_lvalue_loc (loc, arg0);
>>
>>       /* Convert A ? 0 : 1 to !A.  This prefers the use of NOT_EXPR
>>          over COND_EXPR in cases such as floating point comparisons.  */
>>       if (integer_zerop (op1)
>>           && (code == VEC_COND_EXPR ? integer_all_onesp (op2)
>>                                     : (integer_onep (op2)
>>                                        && !VECTOR_TYPE_P (type)))
>>           && truth_value_p (TREE_CODE (arg0)))
>>         return pedantic_non_lvalue_loc (loc,
>>                                     fold_convert_loc (loc, type,
>>                                               invert_truthvalue_loc (loc,
>>                                                                      arg0)));
>>
>> are wrong?  I can't say for sure without a testcase.
>>
>> That said, papering over this in tree-ssa-forwprop.c is not the
>> correct thing to do.
>>
>>> Secondly, I did not catch your idea to implement GCC Vector Extension
>>> for vector comparison with bool result since
>>> such extension completely depends on comparison context, e.g. for your
>>> example, result type of comparison depends on using - for
>>> if-comparison it is scalar, but for c = (a==b) - result type is
>>> vector. I don't think that this is reasonable for current release.
>>
>> The idea was to be able to write testcases exercising different EQ/NE vector
>> compares.  But yes, if that's non-trivial the it's not appropriate for stage3.
>>
>> Can you add a testcase for the forwprop issue and try to fix the offending
>> bogus folders instead?
>>
>> Thanks,
>> Richard.
>>
>>> And finally about AMD performance. I checked that this transformation
>>> works for "-march=bdver4" option and regression for 481.wrf must
>>> disappear too.
>>>
>>> Thanks.
>>> Yuri.
>>>
>>> 2015-12-04 15:18 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Mon, Nov 30, 2015 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Hi All,
>>>>>
>>>>> Here is a patch for 481.wrf preformance regression for avx2 which is
>>>>> sligthly modified mask store optimization. This transformation allows
>>>>> perform unpredication for semi-hammock containing masked stores, other
>>>>> words if we have a loop like
>>>>> for (i=0; i<n; i++)
>>>>>   if (c[i]) {
>>>>>     p1[i] += 1;
>>>>>     p2[i] = p3[i] +2;
>>>>>   }
>>>>>
>>>>> then it will be transformed to
>>>>>    if (!mask__ifc__42.18_165 == { 0, 0, 0, 0, 0, 0, 0, 0 }) {
>>>>>      vect__11.19_170 = MASK_LOAD (vectp_p1.20_168, 0B, mask__ifc__42.18_165);
>>>>>      vect__12.22_172 = vect__11.19_170 + vect_cst__171;
>>>>>      MASK_STORE (vectp_p1.23_175, 0B, mask__ifc__42.18_165, vect__12.22_172);
>>>>>      vect__18.25_182 = MASK_LOAD (vectp_p3.26_180, 0B, mask__ifc__42.18_165);
>>>>>      vect__19.28_184 = vect__18.25_182 + vect_cst__183;
>>>>>      MASK_STORE (vectp_p2.29_187, 0B, mask__ifc__42.18_165, vect__19.28_184);
>>>>>    }
>>>>> i.e. it will put all computations related to masked stores to semi-hammock.
>>>>>
>>>>> Bootstrapping and regression testing did not show any new failures.
>>>>
>>>> Can you please split out the middle-end support for vector equality compares?
>>>>
>>>> @@ -3448,10 +3448,17 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
>>>>        if (TREE_CODE (op0_type) == VECTOR_TYPE
>>>>           || TREE_CODE (op1_type) == VECTOR_TYPE)
>>>>          {
>>>> -          error ("vector comparison returning a boolean");
>>>> -          debug_generic_expr (op0_type);
>>>> -          debug_generic_expr (op1_type);
>>>> -          return true;
>>>> +         /* Allow vector comparison returning boolean if operand types
>>>> +            are equal and CODE is EQ/NE.  */
>>>> +         if ((code != EQ_EXPR && code != NE_EXPR)
>>>> +             || !(VECTOR_BOOLEAN_TYPE_P (op0_type)
>>>> +                  || VECTOR_INTEGER_TYPE_P (op0_type)))
>>>> +           {
>>>> +             error ("type mismatch for vector comparison returning a boolean");
>>>> +             debug_generic_expr (op0_type);
>>>> +             debug_generic_expr (op1_type);
>>>> +             return true;
>>>> +           }
>>>>          }
>>>>      }
>>>>
>>>> please merge the conditions with a &&
>>>>
>>>> @@ -13888,6 +13888,25 @@ fold_relational_const (enum tree_code code,
>>>> tree type, tree op0, tree op1)
>>>>
>>>>    if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
>>>>      {
>>>> +      if (INTEGRAL_TYPE_P (type)
>>>> +         && (TREE_CODE (type) == BOOLEAN_TYPE
>>>> +             || TYPE_PRECISION (type) == 1))
>>>> +       {
>>>> +         /* Have vector comparison with scalar boolean result.  */
>>>> +         bool result = true;
>>>> +         gcc_assert (code == EQ_EXPR || code == NE_EXPR);
>>>> +         gcc_assert (VECTOR_CST_NELTS (op0) == VECTOR_CST_NELTS (op1));
>>>> +         for (unsigned i = 0; i < VECTOR_CST_NELTS (op0); i++)
>>>> +           {
>>>> +             tree elem0 = VECTOR_CST_ELT (op0, i);
>>>> +             tree elem1 = VECTOR_CST_ELT (op1, i);
>>>> +             tree tmp = fold_relational_const (code, type, elem0, elem1);
>>>> +             result &= integer_onep (tmp);
>>>> +         if (code == NE_EXPR)
>>>> +           result = !result;
>>>> +         return constant_boolean_node (result, type);
>>>>
>>>> ... just assumes it is either EQ_EXPR or NE_EXPR.   I believe you want
>>>> to change the
>>>> guarding condition to just
>>>>
>>>>    if (! VECTOR_TYPE_P (type))
>>>>
>>>> and assert the boolean/precision.  Please also merge the asserts into
>>>> one with &&
>>>>
>>>> diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
>>>> index b82ae3c..73ee3be 100644
>>>> --- a/gcc/tree-ssa-forwprop.c
>>>> +++ b/gcc/tree-ssa-forwprop.c
>>>> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
>>>> tree_code code, tree type,
>>>>
>>>>    gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
>>>>
>>>> +  /* Do not perform combining it types are not compatible.  */
>>>> +  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
>>>> +      && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0))))
>>>> +    return NULL_TREE;
>>>> +
>>>>
>>>> again, how does this happen?
>>>>
>>>> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
>>>> index e67048e..1605520c 100644
>>>> --- a/gcc/tree-vrp.c
>>>> +++ b/gcc/tree-vrp.c
>>>> @@ -5760,6 +5760,12 @@ register_edge_assert_for (tree name, edge e,
>>>> gimple_stmt_iterator si,
>>>>                                                 &comp_code, &val))
>>>>      return;
>>>>
>>>> +  /* Use of vector comparison in gcond is very restricted and used to check
>>>> +     that the mask in masked store is zero, so assert for such comparison
>>>> +     is not implemented yet.  */
>>>> +  if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE)
>>>> +    return;
>>>> +
>>>>
>>>> VECTOR_TYPE_P
>>>>
>>>> I believe the comment should simply say that VRP doesn't track ranges for
>>>> vector types.
>>>>
>>>> In the previous review I suggested you should make sure that RTL expansion
>>>> ends up using a well-defined optab for these compares.  To make sure
>>>> this happens across targets I suggest you make these comparisons available
>>>> via the GCC vector extension.  Thus allow
>>>>
>>>> typedef int v4si __attribute__((vector_size(16)));
>>>>
>>>> int foo (v4si a, v4si b)
>>>> {
>>>>   if (a == b)
>>>>     return 4;
>>>> }
>>>>
>>>> and != and also using floating point vectors.
>>>>
>>>> Otherwise it's hard to see the impact of this change.  Obvious choices
>>>> are the eq/ne optabs for FP compares and [u]cmp optabs for integer
>>>> compares.
>>>>
>>>> A half-way implementation like your VRP comment suggests (only
>>>> ==/!= zero against integer vectors is implemented?!) this doesn't sound
>>>> good without also limiting the feature this way in the verifier.
>>>>
>>>> Btw, the regression with WRF is >50% on AMD Bulldozer (which only
>>>> has AVX, not AVX2).
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>> ChangeLog:
>>>>> 2015-11-30  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>
>>>>> PR middle-end/68542
>>>>> * config/i386/i386.c (ix86_expand_branch): Implement integral vector
>>>>> comparison with boolean result.
>>>>> * config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand
>>>>> for vector comparion with eq/ne only.
>>>>> * fold-const.c (fold_relational_const): Add handling of vector
>>>>> comparison with boolean result.
>>>>> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
>>>>> comparison of vector operands with boolean result for EQ/NE only.
>>>>> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
>>>>> (verify_gimple_cond): Likewise.
>>>>> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
>>>>> combining for non-compatible vector types.
>>>>> * tree-vect-loop.c (is_valid_sink): New function.
>>>>> (optimize_mask_stores): Likewise.
>>>>> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
>>>>> has_mask_store field of vect_info.
>>>>> * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
>>>>> vectorized loops having masked stores.
>>>>> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
>>>>> correspondent macros.
>>>>> (optimize_mask_stores): Add prototype.
>>>>> * tree-vrp.c (register_edge_assert_for): Do not handle NAME with vector
>>>>> type.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>> * gcc.target/i386/avx2-vect-mask-store-move1.c: New test.
Follow-Ups:
- Re: [PATCH PR68542]
  - From: Yuri Rumyantsev
References:
- Re: [PATCH PR68542]
  - From: Richard Biener
- Re: [PATCH PR68542]
  - From: Yuri Rumyantsev
- Re: [PATCH PR68542]
  - From: Richard Biener
- Re: [PATCH PR68542]
  - From: Yuri Rumyantsev
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]