This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH]Enable elimination of IV use with unsigned type candidate
- From: "Bin.Cheng" <amker dot cheng at gmail dot com>
- To: Richard Biener <richard dot guenther at gmail dot com>
- Cc: Bin Cheng <bin dot cheng at arm dot com>, Zdenek Dvorak <ook at ucw dot cz>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 1 Jul 2014 11:53:08 +0100
- Subject: Re: [PATCH]Enable elimination of IV use with unsigned type candidate
- Authentication-results: sourceware.org; auth=none
- References: <000001cf8ec8$68c07340$3a4159c0$ at arm dot com> <CAFiYyc18W5R7-CkHBeRi4aiQuu5i=MgenBPL6r64kJ75fQDd0w at mail dot gmail dot com> <CAHFci2-ZJdXrEbV4yQG6UjqswiPp-B8trwrgAn2NAzm4f1fdRw at mail dot gmail dot com>
On Tue, Jul 1, 2014 at 10:32 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> Sorry for this late reply, I spent some time in understanding the problem.
>
> On Tue, Jun 24, 2014 at 12:36 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Mon, Jun 23, 2014 at 11:49 AM, Bin Cheng <bin.cheng@arm.com> wrote:
>
>>> expressions. It's possible to have iv_elimination_compare_lt to do some
>>> undo transformation on may_be_zero, but I found it's difficult for cases
>>> involving signed/unsigned conversion like case loop-41.c. Since I think
>>> there is no obvious benefit to fold may_be_zero here (somehow because the
>>> two operands are already in folded forms), this patch just calls build2_loc
>>> instead.
>>
>> But it may fold to true/false, no?
> You are right, it can be folded to false in many cases. Thus I
> managed to check specific folded forms of may_be_zero, as in attached
> patch. So far it works for tests I added, but there are some other
> folded forms not handled.
>
>>
>
>>>
>>> When GCC trying to eliminate use 0 with cand 0, the miscellaneous trees in
>>> iv_elimination_compare_lt are like below with i_1 of signed type:
>>> B: i_1 + 1
>>> A: 0
>>> niter->niter: (unsigned int)i_1
>>>
>>> Apparently, (B-A-1) is "i_1", which doesn't equal to "(unsigned int)i_1".
>>> Without this patch, it is considered equal to each other.
>>
>> just looking at this part. Do you have a testcase that exhibits a difference
>> when just applying patch A? So I can have a look here?
>>
>> From the code in iv_elimination_compare_lt I can't figure why we'd
>> end up with i_1 instead of (unsigned int)i_1 as we convert to a common
>> type.
>>
>> I suppose the issue may be that at tree_to_aff_combination time
>> we strip all nops with STRIP_NOPS but when operating on ->rest
>> via convert/scale or add we do not strip them again.
>>
>> But then 'nit' should be i_1, not (unsigned int)i_1. So the analysis above
>> really doesn't look correct.
>>
>> Just to make sure we don't paper over an issue in tree-affine.c.
>>
>> Thus - testcase? On x86 we don't run into this place in
>> iv_elimination_compare_lt (on an unpatched tree).
>>
>> CCing Zdenek for the meat of patch B.
>
> I am more and more convinced that check of overflow in function
> iv_elimination_compare_lt is not sufficient. Considering example
> given by comments just before the function:
> /* Tries to replace loop exit by one formulated in terms of a LT_EXPR
> comparison with CAND. NITER describes the number of iterations of
> the loops. If successful, the comparison in COMP_P is altered accordingly.
>
> We aim to handle the following situation:
>
> sometype *base, *p;
> int a, b, i;
>
> i = a;
> p = p_0 = base + a;
>
> do
> {
> bla (*p);
> p++;
> i++;
> }
> while (i < b);
>
> Here, the number of iterations of the loop is (a + 1 > b) ? 0 : b - a - 1.
> We aim to optimize this to
>
> p = p_0 = base + a;
> do
> {
> bla (*p);
> p++;
> }
> while (p < p_0 - a + b);
>
> This preserves the correctness, since the pointer arithmetics does not
> overflow. More precisely:
>
> 1) if a + 1 <= b, then p_0 - a + b is the final value of p, hence there is no
> overflow in computing it or the values of p.
> 2) if a + 1 > b, then we need to verify that the expression p_0 - a does not
> overflow. To prove this, we use the fact that p_0 = base + a. */
>
> The example seems wrong to me, because the statement only stands for
> unsigned case like ivopts-lt.c:
>
>
> #include "stdint.h"
>
> void
> f1 (char *p, uintptr_t i, uintptr_t n)
> {
> p += i;
> do
> {
> *p = '\0';
> p += 1;
> i++;
> }
> while (i < n);
> }
>
> If i/n are of signed type, the 2nd point for "a+1>b" scenario in the
> comment isn't right, because of below facts:
> a) niter of loop are calculated in unsigned type as in function
> number_of_iterations_lt. So loop information for this case is like:
>
> Analyzing # of iterations of loop 1
> exit condition [i_4(D) + 1, + , 1](no_overflow) < n_12(D)
> bounds on difference of bases: -4294967295 ... 4294967294
> result:
> zero if i_4(D) >= n_12(D)
> # of iterations ((unsigned int) n_12(D) - (unsigned int) i_4(D)) -
> 1, bounded by 4294967294
> number of iterations ((unsigned int) n_12(D) - (unsigned int)
> i_4(D)) - 1; zero if i_4(D) >= n_12(D)
>
> b) The (unsigned int)n_12(D) could overflow, the check in
> iv_elimination_compare_lt doesn't take this fact into consideration.
>
> Tricky thing is GCC now doesn't eliminate "i < n" with point P for
> signed version of case. The reason behind is again the may_be_zero
> information. The original form of may_be_zero is like "i_4(D) + 1>
> n_12(D)", which is folded into "i_4(D) >= n_12(D)" because both
> i_4/n_12 are of signed type and don't wrap. In the end, function
> iv_elimination_compare_lt only handles the original form of
> may_be_zero.
> So below case is the closest one to illustrate the problem:
>
> #include "stdint.h"
>
> void
> f1 (char *p, int i, int n)
> {
> p += i;
> do
> {
> *p = '\0';
> p += 1;
> i++;
> }
> while (i < n);
> }
>
> char arr[100] = "hello\n";
>
> int main(void)
> {
> f1 (arr, 1, -1);
>
> return 0;
> }
>
> If we build the original form of may_be_zero with change:
> Index: gcc/tree-ssa-loop-niter.c
> ===================================================================
> --- gcc/tree-ssa-loop-niter.c (revision 211966)
> +++ gcc/tree-ssa-loop-niter.c (working copy)
> @@ -1153,8 +1153,13 @@
> condition. */
>
> if (mpz_sgn (bnds->below) < 0)
> +#if 0
> niter->may_be_zero = fold_build2 (LT_EXPR, boolean_type_node,
> iv1->base, iv0->base);
> +#else
> + niter->may_be_zero = build2_loc (UNKNOWN_LOCATION, LT_EXPR,
> boolean_type_node,
> + iv1->base, iv0->base);
> +#endif
> niter->niter = delta;
> niter->max = widest_int::from (wi::from_mpz (niter_type,
> bnds->up, false),
> TYPE_SIGN (niter_type));
>
> The case will be mis-compiled.
>
> One the other handle, iv_elimination_compare_lt for loops with
> may_be_zero can be further improved to handle cases other than pointer
> candidate. My original patch is intended to handle unsigned type
> candidates, but there are other cases.
>
> The problem here is a little complicated, especially if we want to
> improve iv elimination, I may mis-read the code somehow. So any idea
> about this?
>
BTW, for below gimple dump before ivopt:
foo (int l)
{
long long int sum;
long long int j;
int i;
int _8;
<bb 2>:
<bb 3>:
# i_1 = PHI <0(2), i_4(4)>
# j_2 = PHI <0(2), j_5(4)>
# sum_3 = PHI <0(2), sum_6(4)>
i_4 = i_1 + 1;
j_5 = j_2 + 1;
sum_6 = sum_3 + j_5;
if (i_4 < l_7(D))
goto <bb 4>;
else
goto <bb 5>;
<bb 4>:
goto <bb 3>;
<bb 5>:
# sum_12 = PHI <sum_6(3)>
_8 = (int) sum_12;
return _8;
}
The loop information is like:
;;
;; Loop 1
;; header 3, latch 4
;; depth 1, outer 0
;; nodes: 3 4
Processing loop 1
single exit 3 -> 5, exit condition if (i_4 < l_7(D))
Analyzing # of iterations of loop 1
exit condition [1, + , 1](no_overflow) < l_7(D)
bounds on difference of bases: -2147483649 ... 2147483646
result:
zero if l_7(D) < 1
# of iterations (unsigned int) l_7(D) + 4294967295, bounded by 2147483646
number of iterations (unsigned int) l_7(D) + 4294967295; zero if l_7(D) < 1
Question is, is it safe to eliminate "i_4 < l_7(D)" as below?
Replacing exit test: if (i_4 < l_7(D))
foo (int l)
{
long long int sum;
long long int j;
int i;
int _8;
long long int _10;
unsigned int _11;
unsigned long _13;
unsigned long _14;
unsigned int _15;
<bb 2>:
_11 = (unsigned int) l_7(D);
_15 = _11 + 4294967295;
_14 = (unsigned long) _15;
_13 = _14 + 1;
_10 = (long long int) _13;
<bb 3>:
# j_2 = PHI <0(2), j_5(4)>
# sum_3 = PHI <0(2), sum_6(4)>
j_5 = j_2 + 1;
sum_6 = sum_3 + j_5;
if (j_5 < _10)
goto <bb 4>;
else
goto <bb 5>;
<bb 4>:
goto <bb 3>;
<bb 5>:
# sum_12 = PHI <sum_6(3)>
_8 = (int) sum_12;
return _8;
}
Thanks,
bin