Fix for 56175

Yuri Rumyantsev ysrumyan@gmail.com
Thu Feb 21 15:42:00 GMT 2013


Richard,

This does not fit to my fix  since if we have another pattern like
    t = (u8)((x & 1) ^ ((u8)y & 1));
where y has short type, for rhs operand type sinkning (or hoisting as
you prefer) still will be performed but I don't see any reason for
converting  (u8)y & 1 --> (u8)(y & 1) if y has u16 type for all
targets including x86.

Yuri.

2013/2/21 Richard Biener <richard.guenther@gmail.com>:
> On Thu, Feb 21, 2013 at 4:16 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Richard,
>>
>> Sorry for my previous message - I did not undersatnd it properly.
>>
>> Anyway I proposed another fix that avoid  (type) x & c --> (type) (x &
>> (type-x) c) transformation if x has short type:
>>
>> +++ gcc/tree-ssa-forwprop.c     (working copy)
>> @@ -1789,8 +1789,11 @@
>>    defcodefor_name (arg1, &def1_code, &def1_arg1, &def1_arg2);
>>    defcodefor_name (arg2, &def2_code, &def2_arg1, &def2_arg2);
>>
>> -  /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)).  */
>> +  /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)).
>> +     Do that only if X has not short type.  */
>>    if (TREE_CODE (arg2) == INTEGER_CST
>> +      && (TYPE_PRECISION (TREE_TYPE (arg1))
>> +          >= TYPE_PRECISION (integer_type_node))
>>        && CONVERT_EXPR_CODE_P (def1_code)
>>        && INTEGRAL_TYPE_P (TREE_TYPE (def1_arg1))
>>        && int_fits_type_p (arg2, TREE_TYPE (def1_arg1)))
>>
>> Does this fix look suitable?
>
> I think the fix is to disable the transform if it widens the operation.
> Thus, sth like
>
>     if (TREE_CODE (arg2) == INTEGER_CST
>        && CONVERT_EXPR_CODE_P (def1_code)
>        && INTEGRAL_TYPE_P (TREE_TYPE (def1_arg1))
> +     && (TYPE_PRECISION (TREE_TYPE (arg1))
> +            >= TYPE_PRECISION (TREE_TYPE (def1_arg1)))
>        && int_fits_type_p (arg2, TREE_TYPE (def1_arg1)))
>
> see the restriction we place on the transform for the (T1) x & (T2) y case:
>
>   /* For bitwise binary operations apply operand conversions to the
>      binary operation result instead of to the operands.  This allows
>      to combine successive conversions and bitwise binary operations.  */
>   if (CONVERT_EXPR_CODE_P (def1_code)
>       && CONVERT_EXPR_CODE_P (def2_code)
>       && types_compatible_p (TREE_TYPE (def1_arg1), TREE_TYPE (def2_arg1))
>       /* Make sure that the conversion widens the operands, or has same
>          precision,  or that it changes the operation to a bitfield
>          precision.  */
>       && ((TYPE_PRECISION (TREE_TYPE (def1_arg1))
>            <= TYPE_PRECISION (TREE_TYPE (arg1)))
>           || (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (arg1)))
>               != MODE_INT)
>           || (TYPE_PRECISION (TREE_TYPE (arg1))
>               != GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (arg1))))))
>
> ideally you'd split out that condition into a predicate function taking the
> two type and use it in both places.
>
> Richard.
>
>> Yuri.
>>
>> 2013/2/21 Richard Biener <richard.guenther@gmail.com>:
>>> On Thu, Feb 21, 2013 at 3:26 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Richard,
>>>>
>>>> I double checked that with and without my fix compiler produces the
>>>> same output with -fdump-tree-optimized.
>>>
>>> For what testcase?
>>>
>>> Richard.
>>>
>>>> What patch did you apply? I think that you should apply the second one
>>>> - 56175.diff.
>>>>
>>>> Yuri.
>>>>
>>>> 2013/2/21 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Thu, Feb 21, 2013 at 2:41 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Richard,
>>>>>>
>>>>>> This regression was introduced by Kai
>>>>>>
>>>>>> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01988.html
>>>>>>
>>>>>> 2011-06-27  Kai Tietz  <ktietz@redhat.com>
>>>>>>
>>>>>>         * tree-ssa-forwprop.c (simplify_bitwise_binary): Improve
>>>>>>         type sinking.
>>>>>>         * tree-ssa-math-opts.c (execute_optimize_bswap): Separate
>>>>>>         search for di/si mode patterns for finding widest match.
>>>>>>
>>>>>> Is it sufficient for you to accept my patch?
>>>>>
>>>>> No, I still don't think the fix is sound.  A proper fix would revert the
>>>>> above change (the simplify_bitwise_binary one), watch for testsuite
>>>>> fallout (I can immediately see gcc.dg/tree-ssa/bitwise-sink.c failing)
>>>>> and fix those failures in a better way.
>>>>>
>>>>> Richard.
>>>>>
>>>>>> Best regards.
>>>>>> yuri.
>>>>>>
>>>>>> 2013/2/21 Richard Biener <richard.guenther@gmail.com>:
>>>>>>> On Thu, Feb 21, 2013 at 1:39 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>> Richard,
>>>>>>>>
>>>>>>>> As we know Kai is working on this problem for 4.9 and I assume that
>>>>>>>> type sinking will be deleted from forwprop pass. Could we stay on this
>>>>>>>> fix but more common fix will be done.
>>>>>>>
>>>>>>> Well, unless you show it is a regression the patch is not applicable for 4.8
>>>>>>> anyway.  Not sure if the code will be deleted from forwprop pass in 4.9 either,
>>>>>>> it is after all a canonicalization - fold seems to perform the opposite one
>>>>>>> though:
>>>>>>>
>>>>>>>       /* Convert (T)(x & c) into (T)x & (T)c, if c is an integer
>>>>>>>          constants (if x has signed type, the sign bit cannot be set
>>>>>>>          in c).  This folds extension into the BIT_AND_EXPR.
>>>>>>>
>>>>>>> note that what forwprop does (T)x & c -> (T)(x & c') I'd call type hoisting,
>>>>>>> not sinking.  Generally frontends and fold try to narrow operands when
>>>>>>> possible (even though some targets later widen them again because of
>>>>>>> instruction set constraints).
>>>>>>>
>>>>>>> Most of this hoisting code was done to make lowering logical && and ||
>>>>>>> I believe.  Looking at the testcases added tells us that while matching
>>>>>>> the two patterns as done now helps them but only because that pattern
>>>>>>> feeds single-operand instructions that then simplify.  So doing the transform
>>>>>>> starting from that single-operand instructions instead looks like a better
>>>>>>> fix (op !=/== 0/1 and (T) op) and also would not disagree with the
>>>>>>> canonicalization done by fold.
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>> I also can propose to introduce new hook for it but need to know your
>>>>>>>> opinion since we don't went to waste our time on preparing dead
>>>>>>>> patches. Note that x86 supports all short types in HW and such type
>>>>>>>> sinkning is usually useless if short types are involved.
>>>>>>>>
>>>>>>>> Best regards.
>>>>>>>> Yuri.
>>>>>>>>
>>>>>>>>
>>>>>>>> 2013/2/21 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>> On Wed, Feb 20, 2013 at 4:41 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>> Richard,
>>>>>>>>>>
>>>>>>>>>> First of all, your proposal to move type sinking to the end of
>>>>>>>>>> function does not work since we handle each statement in function and
>>>>>>>>>> we want that 1st type folding of X & C will not happen.
>>>>>>>>>> Note that we have the following sequence of gimple before forwprop1:
>>>>>>>>>>
>>>>>>>>>>    x.0_10 = (signed char) x_8;
>>>>>>>>>>   _11 = x.0_10 & 1;
>>>>>>>>>>   _12 = (signed char) y_9;
>>>>>>>>>>   _13 = _12 & 1;
>>>>>>>>>>   _14 = _11 ^ _13;
>>>>>>>>>
>>>>>>>>> Ah, indeed.  Reminds me of some of my dead patches that separated
>>>>>>>>> forwprop into a forward and backward stage.  Of course then you have
>>>>>>>>> the ordering issue of whether to first forward or backward.
>>>>>>>>>
>>>>>>>>> Which means that I bet you can construct a testcase that with
>>>>>>>>> your change is no longer optimized (just make pushing the conversion
>>>>>>>>> make the types _match_).  Which is always the case
>>>>>>>>> with this kind of local pattern-matching transforms.
>>>>>>>>>
>>>>>>>>> Currently forwprop processes leafs of expression trees first (well, inside
>>>>>>>>> a basic-block), similar to how fold () is supposed to be operated, based
>>>>>>>>> on the idea that simplified / canonicalized leafs helps keeping pattern
>>>>>>>>> recognition simple and cost considerations more accurate.
>>>>>>>>>
>>>>>>>>> When one order works better than another you always have to consider
>>>>>>>>> that the user could already have written the code in a way that results
>>>>>>>>> in the input that isn't well handled.
>>>>>>>>>
>>>>>>>>> Not that this helps very much for the situation ;)
>>>>>>>>>
>>>>>>>>> But I don't like the use of first_pass_instance ... and the fix isn't
>>>>>>>>> an improvement but just a hack for the benchmark.
>>>>>>>>>
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>> I also added comment to my fix and create new test for it. I also
>>>>>>>>>> checked that this test is passed with patched compiler  only. So
>>>>>>>>>> Change Log was also modified:
>>>>>>>>>>
>>>>>>>>>> ChangeLog
>>>>>>>>>>
>>>>>>>>>> 2013-02-20  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>
>>>>>>>>>>         PR tree-optimization/56175
>>>>>>>>>>         * tree-ssa-forwprop.c (simplify_bitwise_binary): Avoid type sinking
>>>>>>>>>>         at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ B) & C
>>>>>>>>>>         for short integer types.
>>>>>>>>>>         * gcc.dg/pr56175.c: New test.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2013/2/20 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>> On Wed, Feb 20, 2013 at 1:00 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>
>>>>>>>>>>>> This patch is aimed to recognize (A & C) ^ (B & C) -> (A ^ B) & C
>>>>>>>>>>>> pattern in simpify_bitwise_binary for short integer types.
>>>>>>>>>>>> The fix is very simple - we simply turn off short type sinking at the
>>>>>>>>>>>> first pass of forward propagation allows to get
>>>>>>>>>>>> +10% speedup for important benchmark Coremark 1.0 at x86 Atom and
>>>>>>>>>>>> +5-7% for other x86 platforms too.
>>>>>>>>>>>> Bootstrapping and regression testing were successful on x86-64.
>>>>>>>>>>>>
>>>>>>>>>>>> Is it Ok for trunk?
>>>>>>>>>>>
>>>>>>>>>>> It definitely needs a comment before the checks.
>>>>>>>>>>>
>>>>>>>>>>> Also I think it simply shows that the code is placed at the wrong spot.
>>>>>>>>>>> Simply moving it down in simplify_bitwise_binary to be done the very last
>>>>>>>>>>> should get both of the effects done.
>>>>>>>>>>>
>>>>>>>>>>> Can you rework the patch according to that?
>>>>>>>>>>>
>>>>>>>>>>> You also miss a testcase, we should make sure to not regress again here.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Richard.
>>>>>>>>>>>
>>>>>>>>>>>> ChangeLog.
>>>>>>>>>>>>
>>>>>>>>>>>> 2013-02-20  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>
>>>>>>>>>>>>         PR tree-optimization/56175
>>>>>>>>>>>>         * tree-ssa-forwprop.c (simplify_bitwise_binary) : Avoid type sinking
>>>>>>>>>>>>         at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ B) & C
>>>>>>>>>>>>         for short integer types.



More information about the Gcc-patches mailing list