This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Simple optimization for MASK_STORE.


On Wed, Nov 11, 2015 at 2:13 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard,
>
> What we should do to cope with this problem (structure size increasing)?
> Should we return to vector comparison version?

Ok, given this constraint I think the cleanest approach is to allow
integer(!) vector equality(!) compares with scalar result.  This should then
expand via cmp_optab and not via vec_cmp_optab.

On gimple you can then have

 if (mask_vec_1 != {0, 0, .... })
...

Note that a fallback expansion (for optabs.c to try) would be
the suggested view-conversion (aka, subreg) variant using
a same-sized integer mode.

Target maintainers can then choose what is a better fit for
their target (and instruction set as register set constraints may apply).

The patch you posted seems to do this but not restrict the compares
to integer ones (please do that).

       if (TREE_CODE (op0_type) == VECTOR_TYPE
          || TREE_CODE (op1_type) == VECTOR_TYPE)
         {
-          error ("vector comparison returning a boolean");
-          debug_generic_expr (op0_type);
-          debug_generic_expr (op1_type);
-          return true;
+         /* Allow vector comparison returning boolean if operand types
+            are equal and CODE is EQ/NE.  */
+         if ((code != EQ_EXPR && code != NE_EXPR)
+             || TREE_CODE (op0_type) != TREE_CODE (op1_type)
+             || TYPE_VECTOR_SUBPARTS (op0_type)
+                != TYPE_VECTOR_SUBPARTS (op1_type)
+             || GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op0_type)))
+                != GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op1_type))))

These are all checked with the useless_type_conversion_p checks done earlier.

As said I'd like to see a

                || ! VECTOR_INTEGER_TYPE_P (op0_type)

check added so we and targets do not need to worry about using EQ/NE vs. CMP
and worry about signed zeros and friends.

+           {
+             error ("type mismatch for vector comparison returning a boolean");
+             debug_generic_expr (op0_type);
+             debug_generic_expr (op1_type);
+             return true;
+           }



--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -422,6 +422,15 @@ forward_propagate_into_comparison_1 (gimple *stmt,
          enum tree_code def_code = gimple_assign_rhs_code (def_stmt);
          bool invariant_only_p = !single_use0_p;

+         /* Can't combine vector comparison with scalar boolean type of
+            the result and VEC_COND_EXPR having vector type of comparison.  */
+         if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
+             && INTEGRAL_TYPE_P (type)
+             && (TREE_CODE (type) == BOOLEAN_TYPE
+                 || TYPE_PRECISION (type) == 1)
+             && def_code == VEC_COND_EXPR)
+           return NULL_TREE;

this hints at larger fallout you paper over here.  So this effectively
means we're trying combining (vec1 != vec2) != 0 for example
and that fails miserably?  If so then the solution is to fix whatever
does not expect this (valid) GENERIC tree.

+  if (ENABLE_ZERO_TEST_FOR_MASK_STORE == 0)
+    return;

not sure if I like a param more than a target hook ... :/

+      /* Create vector comparison with boolean result.  */
+      vectype = TREE_TYPE (mask);
+      zero = build_zero_cst (TREE_TYPE (vectype));
+      zero = build_vector_from_val (vectype, zero);

build_zero_cst (vectype);

+      stmt = gimple_build_cond (EQ_EXPR, mask, zero, NULL_TREE, NULL_TREE);

you can omit the NULL_TREE operands.

+      gcc_assert (vdef && TREE_CODE (vdef) == SSA_NAME);

please omit the assert.

+      gimple_set_vdef (last, new_vdef);

do this before you create the PHI.

+         /* Put definition statement of stored value in STORE_BB
+            if possible.  */
+         arg3 = gimple_call_arg (last, 3);
+         if (TREE_CODE (arg3) == SSA_NAME && has_single_use (arg3))
+           {
...

is this really necessary?  It looks incomplete to me anyway.  I'd rather have
a late sink pass if this shows necessary.  Btw,...

+                it is legal.  */
+             if (gimple_bb (def_stmt) == bb
+                 && is_valid_sink (def_stmt, last_store))

with the implementation of is_valid_sink this is effectively

   && (!gimple_vuse (def_stmt)
          || gimple_vuse (def_stmt) == gimple_vdef (last_store))


I still think this "pass" is quite a hack, esp. as it appears as generic
code in a GIMPLE pass.  And esp. as this hack seems to be needed
for Haswell only, not Boradwell or Skylake.

Thanks,
Richard.

> Thanks.
> Yuri.
>
> 2015-11-11 12:18 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Tue, Nov 10, 2015 at 3:56 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>> 2015-11-10 17:46 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Tue, Nov 10, 2015 at 1:48 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>>> 2015-11-10 15:33 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>> On Fri, Nov 6, 2015 at 2:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>> Richard,
>>>>>>>
>>>>>>> I tried it but 256-bit precision integer type is not yet supported.
>>>>>>
>>>>>> What's the symptom?  The compare cannot be expanded?  Just add a pattern then.
>>>>>> After all we have modes up to XImode.
>>>>>
>>>>> I suppose problem may be in:
>>>>>
>>>>> gcc/config/i386/i386-modes.def:#define MAX_BITSIZE_MODE_ANY_INT (128)
>>>>>
>>>>> which doesn't allow to create constants of bigger size.  Changing it
>>>>> to maximum vector size (512) would mean we increase wide_int structure
>>>>> size significantly. New patterns are probably also needed.
>>>>
>>>> Yes, new patterns are needed but wide-int should be fine (we only need to create
>>>> a literal zero AFACS).  The "new pattern" would be equality/inequality
>>>> against zero
>>>> compares only.
>>>
>>> Currently 256bit integer creation fails because wide_int for max and
>>> min values cannot be created.
>>
>> Hmm, indeed:
>>
>> #1  0x000000000072dab5 in wi::extended_tree<192>::extended_tree (
>>     this=0x7fffffffd950, t=0x7ffff6a000b0)
>>     at /space/rguenther/src/svn/trunk/gcc/tree.h:5125
>> 5125      gcc_checking_assert (TYPE_PRECISION (TREE_TYPE (t)) <= N);
>>
>> but that's not that the constants fail to be created but
>>
>> #5  0x00000000010d8828 in build_nonstandard_integer_type (precision=512,
>>     unsignedp=65) at /space/rguenther/src/svn/trunk/gcc/tree.c:8051
>> 8051      if (tree_fits_uhwi_p (TYPE_MAX_VALUE (itype)))
>> (gdb) l
>> 8046        fixup_unsigned_type (itype);
>> 8047      else
>> 8048        fixup_signed_type (itype);
>> 8049
>> 8050      ret = itype;
>> 8051      if (tree_fits_uhwi_p (TYPE_MAX_VALUE (itype)))
>> 8052        ret = type_hash_canon (tree_to_uhwi (TYPE_MAX_VALUE
>> (itype)), itype);
>>
>> thus the integer type hashing being "interesting".  tree_fits_uhwi_p
>> fails because
>> it does
>>
>> 7289    bool
>> 7290    tree_fits_uhwi_p (const_tree t)
>> 7291    {
>> 7292      return (t != NULL_TREE
>> 7293              && TREE_CODE (t) == INTEGER_CST
>> 7294              && wi::fits_uhwi_p (wi::to_widest (t)));
>> 7295    }
>>
>> and wi::to_widest () fails with doing
>>
>> 5121    template <int N>
>> 5122    inline wi::extended_tree <N>::extended_tree (const_tree t)
>> 5123      : m_t (t)
>> 5124    {
>> 5125      gcc_checking_assert (TYPE_PRECISION (TREE_TYPE (t)) <= N);
>> 5126    }
>>
>> fixing the hashing then runs into type_cache_hasher::equal doing
>> tree_int_cst_equal
>> which again uses to_widest (it should be easier and cheaper to do the compare on
>> the actual tree representation, but well, seems to be just the first
>> of various issues
>> we'd run into).
>>
>> We eventually could fix the assert above (but then need to hope we assert
>> when a computation overflows the narrower precision of widest_int) or use
>> a special really_widest_int (ugh).
>>
>>> It is fixed by increasing MAX_BITSIZE_MODE_ANY_INT, but it increases
>>> WIDE_INT_MAX_ELTS
>>> and thus increases wide_int structure. If we use 512 for
>>> MAX_BITSIZE_MODE_ANY_INT then
>>> wide_int structure would grow by 48 bytes (16 bytes if use 256 for
>>> MAX_BITSIZE_MODE_ANY_INT).
>>> Is it OK for such narrow usage?
>>
>> widest_int is used in some long-living structures (which is the reason for
>> MAX_BITSIZE_MODE_ANY_INT in the first place).  So I don't think so.
>>
>> Richard.
>>
>>> Ilya
>>>
>>>>
>>>> Richard.
>>>>
>>>>> Ilya
>>>>>
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>> Yuri.
>>>>>>>
>>>>>>>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]