Fix for 56175

Yuri Rumyantsev ysrumyan@gmail.com
Mon Feb 25 11:39:00 GMT 2013


Hi Richard,

Please, prepare the patch as you proposed.
I checked that it helps Coremrak 1.0.

Best regards.
Yuri.

2013/2/25 Richard Biener <richard.guenther@gmail.com>:
> On Thu, Feb 21, 2013 at 4:42 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Richard,
>>
>> This does not fit to my fix  since if we have another pattern like
>>     t = (u8)((x & 1) ^ ((u8)y & 1));
>> where y has short type, for rhs operand type sinkning (or hoisting as
>> you prefer) still will be performed but I don't see any reason for
>> converting  (u8)y & 1 --> (u8)(y & 1) if y has u16 type for all
>> targets including x86.
>
> As I said we shouldn't do that and the tests I suggested would make sure
> we don't.  Even if I mistyped them.  Put in the same condition as in the
> (type) X op (type) Y hoisting which is exactly there to avoid widening
> the operands to 'op' with the hoisting.
>
> I can produce a patch later today.
>
> Richard.
>
>> Yuri.
>>
>> 2013/2/21 Richard Biener <richard.guenther@gmail.com>:
>>> On Thu, Feb 21, 2013 at 4:16 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Richard,
>>>>
>>>> Sorry for my previous message - I did not undersatnd it properly.
>>>>
>>>> Anyway I proposed another fix that avoid  (type) x & c --> (type) (x &
>>>> (type-x) c) transformation if x has short type:
>>>>
>>>> +++ gcc/tree-ssa-forwprop.c     (working copy)
>>>> @@ -1789,8 +1789,11 @@
>>>>    defcodefor_name (arg1, &def1_code, &def1_arg1, &def1_arg2);
>>>>    defcodefor_name (arg2, &def2_code, &def2_arg1, &def2_arg2);
>>>>
>>>> -  /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)).  */
>>>> +  /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)).
>>>> +     Do that only if X has not short type.  */
>>>>    if (TREE_CODE (arg2) == INTEGER_CST
>>>> +      && (TYPE_PRECISION (TREE_TYPE (arg1))
>>>> +          >= TYPE_PRECISION (integer_type_node))
>>>>        && CONVERT_EXPR_CODE_P (def1_code)
>>>>        && INTEGRAL_TYPE_P (TREE_TYPE (def1_arg1))
>>>>        && int_fits_type_p (arg2, TREE_TYPE (def1_arg1)))
>>>>
>>>> Does this fix look suitable?
>>>
>>> I think the fix is to disable the transform if it widens the operation.
>>> Thus, sth like
>>>
>>>     if (TREE_CODE (arg2) == INTEGER_CST
>>>        && CONVERT_EXPR_CODE_P (def1_code)
>>>        && INTEGRAL_TYPE_P (TREE_TYPE (def1_arg1))
>>> +     && (TYPE_PRECISION (TREE_TYPE (arg1))
>>> +            >= TYPE_PRECISION (TREE_TYPE (def1_arg1)))
>>>        && int_fits_type_p (arg2, TREE_TYPE (def1_arg1)))
>>>
>>> see the restriction we place on the transform for the (T1) x & (T2) y case:
>>>
>>>   /* For bitwise binary operations apply operand conversions to the
>>>      binary operation result instead of to the operands.  This allows
>>>      to combine successive conversions and bitwise binary operations.  */
>>>   if (CONVERT_EXPR_CODE_P (def1_code)
>>>       && CONVERT_EXPR_CODE_P (def2_code)
>>>       && types_compatible_p (TREE_TYPE (def1_arg1), TREE_TYPE (def2_arg1))
>>>       /* Make sure that the conversion widens the operands, or has same
>>>          precision,  or that it changes the operation to a bitfield
>>>          precision.  */
>>>       && ((TYPE_PRECISION (TREE_TYPE (def1_arg1))
>>>            <= TYPE_PRECISION (TREE_TYPE (arg1)))
>>>           || (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (arg1)))
>>>               != MODE_INT)
>>>           || (TYPE_PRECISION (TREE_TYPE (arg1))
>>>               != GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (arg1))))))
>>>
>>> ideally you'd split out that condition into a predicate function taking the
>>> two type and use it in both places.
>>>
>>> Richard.
>>>
>>>> Yuri.
>>>>
>>>> 2013/2/21 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Thu, Feb 21, 2013 at 3:26 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Richard,
>>>>>>
>>>>>> I double checked that with and without my fix compiler produces the
>>>>>> same output with -fdump-tree-optimized.
>>>>>
>>>>> For what testcase?
>>>>>
>>>>> Richard.
>>>>>
>>>>>> What patch did you apply? I think that you should apply the second one
>>>>>> - 56175.diff.
>>>>>>
>>>>>> Yuri.
>>>>>>
>>>>>> 2013/2/21 Richard Biener <richard.guenther@gmail.com>:
>>>>>>> On Thu, Feb 21, 2013 at 2:41 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>> Richard,
>>>>>>>>
>>>>>>>> This regression was introduced by Kai
>>>>>>>>
>>>>>>>> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01988.html
>>>>>>>>
>>>>>>>> 2011-06-27  Kai Tietz  <ktietz@redhat.com>
>>>>>>>>
>>>>>>>>         * tree-ssa-forwprop.c (simplify_bitwise_binary): Improve
>>>>>>>>         type sinking.
>>>>>>>>         * tree-ssa-math-opts.c (execute_optimize_bswap): Separate
>>>>>>>>         search for di/si mode patterns for finding widest match.
>>>>>>>>
>>>>>>>> Is it sufficient for you to accept my patch?
>>>>>>>
>>>>>>> No, I still don't think the fix is sound.  A proper fix would revert the
>>>>>>> above change (the simplify_bitwise_binary one), watch for testsuite
>>>>>>> fallout (I can immediately see gcc.dg/tree-ssa/bitwise-sink.c failing)
>>>>>>> and fix those failures in a better way.
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>> Best regards.
>>>>>>>> yuri.
>>>>>>>>
>>>>>>>> 2013/2/21 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>> On Thu, Feb 21, 2013 at 1:39 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>> Richard,
>>>>>>>>>>
>>>>>>>>>> As we know Kai is working on this problem for 4.9 and I assume that
>>>>>>>>>> type sinking will be deleted from forwprop pass. Could we stay on this
>>>>>>>>>> fix but more common fix will be done.
>>>>>>>>>
>>>>>>>>> Well, unless you show it is a regression the patch is not applicable for 4.8
>>>>>>>>> anyway.  Not sure if the code will be deleted from forwprop pass in 4.9 either,
>>>>>>>>> it is after all a canonicalization - fold seems to perform the opposite one
>>>>>>>>> though:
>>>>>>>>>
>>>>>>>>>       /* Convert (T)(x & c) into (T)x & (T)c, if c is an integer
>>>>>>>>>          constants (if x has signed type, the sign bit cannot be set
>>>>>>>>>          in c).  This folds extension into the BIT_AND_EXPR.
>>>>>>>>>
>>>>>>>>> note that what forwprop does (T)x & c -> (T)(x & c') I'd call type hoisting,
>>>>>>>>> not sinking.  Generally frontends and fold try to narrow operands when
>>>>>>>>> possible (even though some targets later widen them again because of
>>>>>>>>> instruction set constraints).
>>>>>>>>>
>>>>>>>>> Most of this hoisting code was done to make lowering logical && and ||
>>>>>>>>> I believe.  Looking at the testcases added tells us that while matching
>>>>>>>>> the two patterns as done now helps them but only because that pattern
>>>>>>>>> feeds single-operand instructions that then simplify.  So doing the transform
>>>>>>>>> starting from that single-operand instructions instead looks like a better
>>>>>>>>> fix (op !=/== 0/1 and (T) op) and also would not disagree with the
>>>>>>>>> canonicalization done by fold.
>>>>>>>>>
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>> I also can propose to introduce new hook for it but need to know your
>>>>>>>>>> opinion since we don't went to waste our time on preparing dead
>>>>>>>>>> patches. Note that x86 supports all short types in HW and such type
>>>>>>>>>> sinkning is usually useless if short types are involved.
>>>>>>>>>>
>>>>>>>>>> Best regards.
>>>>>>>>>> Yuri.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2013/2/21 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>> On Wed, Feb 20, 2013 at 4:41 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>> Richard,
>>>>>>>>>>>>
>>>>>>>>>>>> First of all, your proposal to move type sinking to the end of
>>>>>>>>>>>> function does not work since we handle each statement in function and
>>>>>>>>>>>> we want that 1st type folding of X & C will not happen.
>>>>>>>>>>>> Note that we have the following sequence of gimple before forwprop1:
>>>>>>>>>>>>
>>>>>>>>>>>>    x.0_10 = (signed char) x_8;
>>>>>>>>>>>>   _11 = x.0_10 & 1;
>>>>>>>>>>>>   _12 = (signed char) y_9;
>>>>>>>>>>>>   _13 = _12 & 1;
>>>>>>>>>>>>   _14 = _11 ^ _13;
>>>>>>>>>>>
>>>>>>>>>>> Ah, indeed.  Reminds me of some of my dead patches that separated
>>>>>>>>>>> forwprop into a forward and backward stage.  Of course then you have
>>>>>>>>>>> the ordering issue of whether to first forward or backward.
>>>>>>>>>>>
>>>>>>>>>>> Which means that I bet you can construct a testcase that with
>>>>>>>>>>> your change is no longer optimized (just make pushing the conversion
>>>>>>>>>>> make the types _match_).  Which is always the case
>>>>>>>>>>> with this kind of local pattern-matching transforms.
>>>>>>>>>>>
>>>>>>>>>>> Currently forwprop processes leafs of expression trees first (well, inside
>>>>>>>>>>> a basic-block), similar to how fold () is supposed to be operated, based
>>>>>>>>>>> on the idea that simplified / canonicalized leafs helps keeping pattern
>>>>>>>>>>> recognition simple and cost considerations more accurate.
>>>>>>>>>>>
>>>>>>>>>>> When one order works better than another you always have to consider
>>>>>>>>>>> that the user could already have written the code in a way that results
>>>>>>>>>>> in the input that isn't well handled.
>>>>>>>>>>>
>>>>>>>>>>> Not that this helps very much for the situation ;)
>>>>>>>>>>>
>>>>>>>>>>> But I don't like the use of first_pass_instance ... and the fix isn't
>>>>>>>>>>> an improvement but just a hack for the benchmark.
>>>>>>>>>>>
>>>>>>>>>>> Richard.
>>>>>>>>>>>
>>>>>>>>>>>> I also added comment to my fix and create new test for it. I also
>>>>>>>>>>>> checked that this test is passed with patched compiler  only. So
>>>>>>>>>>>> Change Log was also modified:
>>>>>>>>>>>>
>>>>>>>>>>>> ChangeLog
>>>>>>>>>>>>
>>>>>>>>>>>> 2013-02-20  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>
>>>>>>>>>>>>         PR tree-optimization/56175
>>>>>>>>>>>>         * tree-ssa-forwprop.c (simplify_bitwise_binary): Avoid type sinking
>>>>>>>>>>>>         at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ B) & C
>>>>>>>>>>>>         for short integer types.
>>>>>>>>>>>>         * gcc.dg/pr56175.c: New test.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2013/2/20 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>> On Wed, Feb 20, 2013 at 1:00 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This patch is aimed to recognize (A & C) ^ (B & C) -> (A ^ B) & C
>>>>>>>>>>>>>> pattern in simpify_bitwise_binary for short integer types.
>>>>>>>>>>>>>> The fix is very simple - we simply turn off short type sinking at the
>>>>>>>>>>>>>> first pass of forward propagation allows to get
>>>>>>>>>>>>>> +10% speedup for important benchmark Coremark 1.0 at x86 Atom and
>>>>>>>>>>>>>> +5-7% for other x86 platforms too.
>>>>>>>>>>>>>> Bootstrapping and regression testing were successful on x86-64.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is it Ok for trunk?
>>>>>>>>>>>>>
>>>>>>>>>>>>> It definitely needs a comment before the checks.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also I think it simply shows that the code is placed at the wrong spot.
>>>>>>>>>>>>> Simply moving it down in simplify_bitwise_binary to be done the very last
>>>>>>>>>>>>> should get both of the effects done.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can you rework the patch according to that?
>>>>>>>>>>>>>
>>>>>>>>>>>>> You also miss a testcase, we should make sure to not regress again here.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> ChangeLog.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2013-02-20  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         PR tree-optimization/56175
>>>>>>>>>>>>>>         * tree-ssa-forwprop.c (simplify_bitwise_binary) : Avoid type sinking
>>>>>>>>>>>>>>         at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ B) & C
>>>>>>>>>>>>>>         for short integer types.



More information about the Gcc-patches mailing list