This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Fix PR66313
- From: "Bin.Cheng" <amker dot cheng at gmail dot com>
- To: Richard Biener <rguenther at suse dot de>, gcc-patches List <gcc-patches at gcc dot gnu dot org>, Richard Sandiford <richard dot sandiford at linaro dot org>
- Date: Tue, 13 Jun 2017 12:28:14 +0100
- Subject: Re: [PATCH] Fix PR66313
- Authentication-results: sourceware.org; auth=none
- References: <alpine.LSU.2.20.1705311603540.20726@zhemvz.fhfr.qr> <877f0gfbwz.fsf@linaro.org> <alpine.LSU.2.20.1706131218550.22867@zhemvz.fhfr.qr> <8737b4f7wj.fsf@linaro.org>
On Tue, Jun 13, 2017 at 12:23 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <rguenther@suse.de> writes:
>> On Tue, 13 Jun 2017, Richard Sandiford wrote:
>>> Richard Biener <rguenther@suse.de> writes:
>>> > So I've come back to PR66313 and found a solution to the tailrecursion
>>> > missed optimization when fixing the factoring folding to use an unsigned
>>> > type when we're not sure of overflow.
>>> >
>>> > The folding part is identical to my last try from 2015, the tailrecursion
>>> > part makes us handle intermittent stmts that were introduced by foldings
>>> > that "clobber" our quest walking the single-use chain of stmts between
>>> > the call and the return (and failing at all stmts that are not part
>>> > of said chain). A simple solution is to move the stmts that are not
>>> > part of the chain and that we can move before the call. That handles
>>> > the leaf conversions that now appear for tree-ssa/tailrecursion-6.c
>>> >
>>> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>>> >
>>> > Richard.
>>> >
>>> > 2017-05-31 Richard Biener <rguenther@suse.de>
>>> >
>>> > PR middle-end/66313
>>> > * fold-const.c (fold_plusminus_mult_expr): If the factored
>>> > factor may be zero use a wrapping type for the inner operation.
>>> > * tree-tailcall.c (independent_of_stmt_p): Pass in to_move bitmap
>>> > and handle moved defs.
>>> > (process_assignment): Properly guard the unary op case. Return a
>>> > tri-state indicating that moving the stmt before the call may allow
>>> > to continue. Pass through to_move.
>>> > (find_tail_calls): Handle moving unrelated defs before
>>> > the call.
>>> >
>>> > * c-c++-common/ubsan/pr66313.c: New testcase.
>>> > * gcc.dg/tree-ssa/loop-15.c: Adjust.
>>> >
>>> > Index: gcc/fold-const.c
>>> > ===================================================================
>>> > *** gcc/fold-const.c.orig 2015-10-29 12:32:33.302782318 +0100
>>> > --- gcc/fold-const.c 2015-10-29 14:08:39.936497739 +0100
>>> > *************** fold_plusminus_mult_expr (location_t loc
>>> > *** 6916,6925 ****
>>> > }
>>> > same = NULL_TREE;
>>> >
>>> > ! if (operand_equal_p (arg01, arg11, 0))
>>> > ! same = arg01, alt0 = arg00, alt1 = arg10;
>>> > ! else if (operand_equal_p (arg00, arg10, 0))
>>> > same = arg00, alt0 = arg01, alt1 = arg11;
>>> > else if (operand_equal_p (arg00, arg11, 0))
>>> > same = arg00, alt0 = arg01, alt1 = arg10;
>>> > else if (operand_equal_p (arg01, arg10, 0))
>>> > --- 6916,6926 ----
>>> > }
>>> > same = NULL_TREE;
>>> >
>>> > ! /* Prefer factoring a common non-constant. */
>>> > ! if (operand_equal_p (arg00, arg10, 0))
>>> > same = arg00, alt0 = arg01, alt1 = arg11;
>>> > + else if (operand_equal_p (arg01, arg11, 0))
>>> > + same = arg01, alt0 = arg00, alt1 = arg10;
>>> > else if (operand_equal_p (arg00, arg11, 0))
>>> > same = arg00, alt0 = arg01, alt1 = arg10;
>>> > else if (operand_equal_p (arg01, arg10, 0))
>>> > *************** fold_plusminus_mult_expr (location_t loc
>>> > *** 6974,6987 ****
>>> > }
>>> > }
>>> >
>>> > ! if (same)
>>> > return fold_build2_loc (loc, MULT_EXPR, type,
>>> > fold_build2_loc (loc, code, type,
>>> > fold_convert_loc (loc, type, alt0),
>>> > fold_convert_loc (loc, type, alt1)),
>>> > fold_convert_loc (loc, type, same));
>>> >
>>> > ! return NULL_TREE;
>>> > }
>>> >
>>> > /* Subroutine of native_encode_expr. Encode the INTEGER_CST
>>> > --- 6975,7010 ----
>>> > }
>>> > }
>>> >
>>> > ! if (!same)
>>> > ! return NULL_TREE;
>>> > !
>>> > ! if (! INTEGRAL_TYPE_P (type)
>>> > ! || TYPE_OVERFLOW_WRAPS (type)
>>> > ! /* We are neither factoring zero nor minus one. */
>>> > ! || TREE_CODE (same) == INTEGER_CST)
>>> > return fold_build2_loc (loc, MULT_EXPR, type,
>>> > fold_build2_loc (loc, code, type,
>>> > fold_convert_loc (loc, type, alt0),
>>> > fold_convert_loc (loc, type, alt1)),
>>> > fold_convert_loc (loc, type, same));
>>> >
>>> > ! /* Same may be zero and thus the operation 'code' may overflow. Likewise
>>> > ! same may be minus one and thus the multiplication may overflow. Perform
>>> > ! the operations in an unsigned type. */
>>> > ! tree utype = unsigned_type_for (type);
>>> > ! tree tem = fold_build2_loc (loc, code, utype,
>>> > ! fold_convert_loc (loc, utype, alt0),
>>> > ! fold_convert_loc (loc, utype, alt1));
>>> > ! /* If the sum evaluated to a constant that is not -INF the multiplication
>>> > ! cannot overflow. */
>>> > ! if (TREE_CODE (tem) == INTEGER_CST
>>> > ! && ! wi::eq_p (tem, wi::min_value (TYPE_PRECISION (utype), SIGNED)))
>>> > ! return fold_build2_loc (loc, MULT_EXPR, type,
>>> > ! fold_convert (type, tem), same);
>>> > !
>>> > ! return fold_convert_loc (loc, type,
>>> > ! fold_build2_loc (loc, MULT_EXPR, utype, tem,
>>> > ! fold_convert_loc (loc, utype, same)));
>>> > }
>>> >
>>> > /* Subroutine of native_encode_expr. Encode the INTEGER_CST
>>>
>>> Sorry if you already know, but this part means that we can no longer
>>> vectorise:
>>>
>>> int
>>> f (int *x, int b1, int b2, int b3)
>>> {
>>> int foo = 0;
>>> for (int i1 = 0; i1 < b1; ++i1)
>>> for (int i2 = 0; i2 < b2; ++i2)
>>> for (int i3 = 0; i3 < b3; ++i3)
>>> foo += x[i1 * b2 * b3 + i2 * b3 + (i3 - 1)];
>>> return foo;
>>> }
>>>
>>> We now convert all the arithmetic in the [...] to unsigned int
>>> and reassociate it so that the "- 1" is applied last. We then assume
>>> that the overflow in this subtraction is well-defined and that the
>>> &x[...] might not be linear for the inner loop.
>>
>> Can you file a PR?
>
> OK, filed as PR81082.
>
>> # i3_44 = PHI <i3_32(6), 0(4)>
>> i3.2_7 = (unsigned int) i3_44;
>> _9 = i3.2_7 + _34;
>> _10 = (int) _9;
>> _11 = (long unsigned int) _10;
>> _12 = _11 * 4;
>> _13 = x_30(D) + _12;
>> _14 = *_13;
>> ...
>> i3_32 = i3_44 + 1;
>>
>> so we have *(x_30(D) + 4 * ((sizetype)(int)((unsigned){ 0, + , 1} + X))
>>
>> It might mean that we need to avoid this re-association. Not sure how
>> to detect worthwhile vs. not worthwhile cases during folding. At least
>> I see no easy way to recover from it in SCEV analysis unless the
>> number of iterations is constrained more than in your example.
>
> Yeah, in the example that this was reduced from, not reassociating
> is far more preferable to changing the types. But like you say,
> I've no idea how we'd know that at this stage.
Looks like it has become a general problem. IIRC, loop-im also hoists
signed computation out of loop into unsigned computations.
Thanks,
bin
>
> Thanks,
> Richard
>
>> Ideas welcome ;)
>>
>> Thanks,
>> Richard.