This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Simple optimization for MASK_STORE.

From: Richard Biener <richard dot guenther at gmail dot com>
To: Yuri Rumyantsev <ysrumyan at gmail dot com>
Cc: Ilya Enkovich <enkovich dot gnu at gmail dot com>, Jeff Law <law at redhat dot com>, gcc-patches <gcc-patches at gcc dot gnu dot org>, Igor Zamyatin <izamyatin at gmail dot com>
Date: Thu, 12 Nov 2015 14:58:56 +0100
Subject: Re: [PATCH] Simple optimization for MASK_STORE.
Authentication-results: sourceware.org; auth=none
References: <CAEoMCqRmV48Ytdew0azyTQWZcmfFmjX-JaLtYUz8Q3wejL2RnQ at mail dot gmail dot com> <CAFiYyc38QMSXL058QuV0TZMAku=Ur0FXhF9TEm2Lp7C_HHmWLg at mail dot gmail dot com> <CAEoMCqQy045OoQu-v0AgWv=i8FPJffSvw7dQXsAYccB-Tc8nLw at mail dot gmail dot com> <CAFiYyc0V91KWWRLmkyUBbafVnS=6ZJz0ntsF7kt8X_0W0rgS4A at mail dot gmail dot com> <CAEoMCqSc7CAn=Rp5aM47szM_B-xa+CCA6r+FhysbBvYz=pxNrQ at mail dot gmail dot com> <559F5D7B dot 6070208 at redhat dot com> <CAEoMCqT+dBfjWkGdwMiSdV_aVjKCAx9b=-OP+eoOxD8_PddkcQ at mail dot gmail dot com> <55B148AB dot 6010103 at redhat dot com> <CAFiYyc0py=1Uqx8YdN-P8-2E11w1_7hUo8YsTO2ZdGHJo21cug at mail dot gmail dot com> <55B28DCB dot 2080404 at redhat dot com> <CAFiYyc3KugH_KPLvi3ip=zX-p6dLuQQEzLyDJAVG8emELJuajg at mail dot gmail dot com> <CAEoMCqSg2s8Hy-XXuZJ_9eNySi7PTE6S1MrtaD9ZOOJmt+ht4w at mail dot gmail dot com> <CAEoMCqRLku44v9S=HbDSFa1_5Q7dx375Ck2+QfX9KjaDKkmzAA at mail dot gmail dot com> <CAFiYyc0iqN13Nxm6rBPq1GTL+hzO3wqyAKojpk2gmey-H22+2A at mail dot gmail dot com> <CAEoMCqR=nYnqLibLbdStqXM1WOu1cwn9mVc_Se26ALmYg_ze=g at mail dot gmail dot com> <CAFiYyc18GWgQKJqL=Dtq-rUSDck1prO_ho5nWftNqB_Cq90Ebg at mail dot gmail dot com> <CAEoMCqSOARTebdBQE5nJZzO1e1379tQN_w7q8cJMpNBkQOXFkw at mail dot gmail dot com> <CAFiYyc0w1WkK5ZZ7cFv2ZJ=kD0sFpdVN85XuFHYTQeWC3E1eEw at mail dot gmail dot com> <CAMbmDYY+oGih4c9XGT87EToPmO0XrchbSZPv2g-FuxnuMD12LQ at mail dot gmail dot com> <CAFiYyc0tEYsO44260xNSc0S14QPzog9H8o4uSLxKon0NCAsB9Q at mail dot gmail dot com> <CAMbmDYYdqVRLorQT1VcS0rJON-ugvU-bFdFRb5y_K_AJNJoYDA at mail dot gmail dot com> <CAFiYyc1D5ZkAu5SGRc0TBkGvcsbqeMsohhccgP1iY-JgPuAOgw at mail dot gmail dot com> <CAEoMCqRnBaL9fvfabCPF+w+8C5TSnWt_cHss6Uk_otQWh1iKgw at mail dot gmail dot com>

On Wed, Nov 11, 2015 at 2:13 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard,
>
> What we should do to cope with this problem (structure size increasing)?
> Should we return to vector comparison version?

Ok, given this constraint I think the cleanest approach is to allow
integer(!) vector equality(!) compares with scalar result.  This should then
expand via cmp_optab and not via vec_cmp_optab.

On gimple you can then have

 if (mask_vec_1 != {0, 0, .... })
...

Note that a fallback expansion (for optabs.c to try) would be
the suggested view-conversion (aka, subreg) variant using
a same-sized integer mode.

Target maintainers can then choose what is a better fit for
their target (and instruction set as register set constraints may apply).

The patch you posted seems to do this but not restrict the compares
to integer ones (please do that).

       if (TREE_CODE (op0_type) == VECTOR_TYPE
          || TREE_CODE (op1_type) == VECTOR_TYPE)
         {
-          error ("vector comparison returning a boolean");
-          debug_generic_expr (op0_type);
-          debug_generic_expr (op1_type);
-          return true;
+         /* Allow vector comparison returning boolean if operand types
+            are equal and CODE is EQ/NE.  */
+         if ((code != EQ_EXPR && code != NE_EXPR)
+             || TREE_CODE (op0_type) != TREE_CODE (op1_type)
+             || TYPE_VECTOR_SUBPARTS (op0_type)
+                != TYPE_VECTOR_SUBPARTS (op1_type)
+             || GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op0_type)))
+                != GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op1_type))))

These are all checked with the useless_type_conversion_p checks done earlier.

As said I'd like to see a

                || ! VECTOR_INTEGER_TYPE_P (op0_type)

check added so we and targets do not need to worry about using EQ/NE vs. CMP
and worry about signed zeros and friends.

+           {
+             error ("type mismatch for vector comparison returning a boolean");
+             debug_generic_expr (op0_type);
+             debug_generic_expr (op1_type);
+             return true;
+           }



--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -422,6 +422,15 @@ forward_propagate_into_comparison_1 (gimple *stmt,
          enum tree_code def_code = gimple_assign_rhs_code (def_stmt);
          bool invariant_only_p = !single_use0_p;

+         /* Can't combine vector comparison with scalar boolean type of
+            the result and VEC_COND_EXPR having vector type of comparison.  */
+         if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
+             && INTEGRAL_TYPE_P (type)
+             && (TREE_CODE (type) == BOOLEAN_TYPE
+                 || TYPE_PRECISION (type) == 1)
+             && def_code == VEC_COND_EXPR)
+           return NULL_TREE;

this hints at larger fallout you paper over here.  So this effectively
means we're trying combining (vec1 != vec2) != 0 for example
and that fails miserably?  If so then the solution is to fix whatever
does not expect this (valid) GENERIC tree.

+  if (ENABLE_ZERO_TEST_FOR_MASK_STORE == 0)
+    return;

not sure if I like a param more than a target hook ... :/

+      /* Create vector comparison with boolean result.  */
+      vectype = TREE_TYPE (mask);
+      zero = build_zero_cst (TREE_TYPE (vectype));
+      zero = build_vector_from_val (vectype, zero);

build_zero_cst (vectype);

+      stmt = gimple_build_cond (EQ_EXPR, mask, zero, NULL_TREE, NULL_TREE);

you can omit the NULL_TREE operands.

+      gcc_assert (vdef && TREE_CODE (vdef) == SSA_NAME);

please omit the assert.

+      gimple_set_vdef (last, new_vdef);

do this before you create the PHI.

+         /* Put definition statement of stored value in STORE_BB
+            if possible.  */
+         arg3 = gimple_call_arg (last, 3);
+         if (TREE_CODE (arg3) == SSA_NAME && has_single_use (arg3))
+           {
...

is this really necessary?  It looks incomplete to me anyway.  I'd rather have
a late sink pass if this shows necessary.  Btw,...

+                it is legal.  */
+             if (gimple_bb (def_stmt) == bb
+                 && is_valid_sink (def_stmt, last_store))

with the implementation of is_valid_sink this is effectively

   && (!gimple_vuse (def_stmt)
          || gimple_vuse (def_stmt) == gimple_vdef (last_store))


I still think this "pass" is quite a hack, esp. as it appears as generic
code in a GIMPLE pass.  And esp. as this hack seems to be needed
for Haswell only, not Boradwell or Skylake.

Thanks,
Richard.

> Thanks.
> Yuri.
>
> 2015-11-11 12:18 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Tue, Nov 10, 2015 at 3:56 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>> 2015-11-10 17:46 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Tue, Nov 10, 2015 at 1:48 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>>> 2015-11-10 15:33 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>> On Fri, Nov 6, 2015 at 2:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>> Richard,
>>>>>>>
>>>>>>> I tried it but 256-bit precision integer type is not yet supported.
>>>>>>
>>>>>> What's the symptom?  The compare cannot be expanded?  Just add a pattern then.
>>>>>> After all we have modes up to XImode.
>>>>>
>>>>> I suppose problem may be in:
>>>>>
>>>>> gcc/config/i386/i386-modes.def:#define MAX_BITSIZE_MODE_ANY_INT (128)
>>>>>
>>>>> which doesn't allow to create constants of bigger size.  Changing it
>>>>> to maximum vector size (512) would mean we increase wide_int structure
>>>>> size significantly. New patterns are probably also needed.
>>>>
>>>> Yes, new patterns are needed but wide-int should be fine (we only need to create
>>>> a literal zero AFACS).  The "new pattern" would be equality/inequality
>>>> against zero
>>>> compares only.
>>>
>>> Currently 256bit integer creation fails because wide_int for max and
>>> min values cannot be created.
>>
>> Hmm, indeed:
>>
>> #1  0x000000000072dab5 in wi::extended_tree<192>::extended_tree (
>>     this=0x7fffffffd950, t=0x7ffff6a000b0)
>>     at /space/rguenther/src/svn/trunk/gcc/tree.h:5125
>> 5125      gcc_checking_assert (TYPE_PRECISION (TREE_TYPE (t)) <= N);
>>
>> but that's not that the constants fail to be created but
>>
>> #5  0x00000000010d8828 in build_nonstandard_integer_type (precision=512,
>>     unsignedp=65) at /space/rguenther/src/svn/trunk/gcc/tree.c:8051
>> 8051      if (tree_fits_uhwi_p (TYPE_MAX_VALUE (itype)))
>> (gdb) l
>> 8046        fixup_unsigned_type (itype);
>> 8047      else
>> 8048        fixup_signed_type (itype);
>> 8049
>> 8050      ret = itype;
>> 8051      if (tree_fits_uhwi_p (TYPE_MAX_VALUE (itype)))
>> 8052        ret = type_hash_canon (tree_to_uhwi (TYPE_MAX_VALUE
>> (itype)), itype);
>>
>> thus the integer type hashing being "interesting".  tree_fits_uhwi_p
>> fails because
>> it does
>>
>> 7289    bool
>> 7290    tree_fits_uhwi_p (const_tree t)
>> 7291    {
>> 7292      return (t != NULL_TREE
>> 7293              && TREE_CODE (t) == INTEGER_CST
>> 7294              && wi::fits_uhwi_p (wi::to_widest (t)));
>> 7295    }
>>
>> and wi::to_widest () fails with doing
>>
>> 5121    template <int N>
>> 5122    inline wi::extended_tree <N>::extended_tree (const_tree t)
>> 5123      : m_t (t)
>> 5124    {
>> 5125      gcc_checking_assert (TYPE_PRECISION (TREE_TYPE (t)) <= N);
>> 5126    }
>>
>> fixing the hashing then runs into type_cache_hasher::equal doing
>> tree_int_cst_equal
>> which again uses to_widest (it should be easier and cheaper to do the compare on
>> the actual tree representation, but well, seems to be just the first
>> of various issues
>> we'd run into).
>>
>> We eventually could fix the assert above (but then need to hope we assert
>> when a computation overflows the narrower precision of widest_int) or use
>> a special really_widest_int (ugh).
>>
>>> It is fixed by increasing MAX_BITSIZE_MODE_ANY_INT, but it increases
>>> WIDE_INT_MAX_ELTS
>>> and thus increases wide_int structure. If we use 512 for
>>> MAX_BITSIZE_MODE_ANY_INT then
>>> wide_int structure would grow by 48 bytes (16 bytes if use 256 for
>>> MAX_BITSIZE_MODE_ANY_INT).
>>> Is it OK for such narrow usage?
>>
>> widest_int is used in some long-living structures (which is the reason for
>> MAX_BITSIZE_MODE_ANY_INT in the first place).  So I don't think so.
>>
>> Richard.
>>
>>> Ilya
>>>
>>>>
>>>> Richard.
>>>>
>>>>> Ilya
>>>>>
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>> Yuri.
>>>>>>>
>>>>>>>

Follow-Ups:
- Re: [PATCH] Simple optimization for MASK_STORE.
  - From: Yuri Rumyantsev

References:
- Re: [PATCH] Simple optimization for MASK_STORE.
  - From: Yuri Rumyantsev
- Re: [PATCH] Simple optimization for MASK_STORE.
  - From: Richard Biener
- Re: [PATCH] Simple optimization for MASK_STORE.
  - From: Yuri Rumyantsev
- Re: [PATCH] Simple optimization for MASK_STORE.
  - From: Richard Biener
- Re: [PATCH] Simple optimization for MASK_STORE.
  - From: Ilya Enkovich
- Re: [PATCH] Simple optimization for MASK_STORE.
  - From: Richard Biener
- Re: [PATCH] Simple optimization for MASK_STORE.
  - From: Ilya Enkovich
- Re: [PATCH] Simple optimization for MASK_STORE.
  - From: Richard Biener
- Re: [PATCH] Simple optimization for MASK_STORE.
  - From: Yuri Rumyantsev

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]