This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH]Enable elimination of IV use with unsigned type candidate

From: "Bin.Cheng" <amker dot cheng at gmail dot com>
To: Richard Biener <richard dot guenther at gmail dot com>
Cc: Bin Cheng <bin dot cheng at arm dot com>, Zdenek Dvorak <ook at ucw dot cz>, GCC Patches <gcc-patches at gcc dot gnu dot org>
Date: Tue, 1 Jul 2014 11:53:08 +0100
Subject: Re: [PATCH]Enable elimination of IV use with unsigned type candidate
Authentication-results: sourceware.org; auth=none
References: <000001cf8ec8$68c07340$3a4159c0$ at arm dot com> <CAFiYyc18W5R7-CkHBeRi4aiQuu5i=MgenBPL6r64kJ75fQDd0w at mail dot gmail dot com> <CAHFci2-ZJdXrEbV4yQG6UjqswiPp-B8trwrgAn2NAzm4f1fdRw at mail dot gmail dot com>

On Tue, Jul 1, 2014 at 10:32 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> Sorry for this late reply, I spent some time in understanding the problem.
>
> On Tue, Jun 24, 2014 at 12:36 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Mon, Jun 23, 2014 at 11:49 AM, Bin Cheng <bin.cheng@arm.com> wrote:
>
>>> expressions.  It's possible to have iv_elimination_compare_lt to do some
>>> undo transformation on may_be_zero, but I found it's difficult for cases
>>> involving signed/unsigned conversion like case loop-41.c.  Since I think
>>> there is no obvious benefit to fold may_be_zero here (somehow because the
>>> two operands are already in folded forms), this patch just calls build2_loc
>>> instead.
>>
>> But it may fold to true/false, no?
> You are right, it can be folded to false in many cases.  Thus I
> managed to check specific folded forms of may_be_zero, as in attached
> patch.  So far it works for tests I added, but there are some other
> folded forms not handled.
>
>>
>
>>>
>>> When GCC trying to eliminate use 0 with cand 0, the miscellaneous trees in
>>> iv_elimination_compare_lt are like below with i_1 of signed type:
>>> B:             i_1 + 1
>>> A:             0
>>> niter->niter:  (unsigned int)i_1
>>>
>>> Apparently, (B-A-1) is "i_1", which doesn't equal to "(unsigned int)i_1".
>>> Without this patch, it is considered equal to each other.
>>
>> just looking at this part.  Do you have a testcase that exhibits a difference
>> when just applying patch A?  So I can have a look here?
>>
>> From the code in iv_elimination_compare_lt I can't figure why we'd
>> end up with i_1 instead of (unsigned int)i_1 as we convert to a common
>> type.
>>
>> I suppose the issue may be that at tree_to_aff_combination time
>> we strip all nops with STRIP_NOPS but when operating on ->rest
>> via convert/scale or add we do not strip them again.
>>
>> But then 'nit' should be i_1, not (unsigned int)i_1.  So the analysis above
>> really doesn't look correct.
>>
>> Just to make sure we don't paper over an issue in tree-affine.c.
>>
>> Thus - testcase?  On x86 we don't run into this place in
>> iv_elimination_compare_lt (on an unpatched tree).
>>
>> CCing Zdenek for the meat of patch B.
>
> I am more and more convinced that check of overflow in function
> iv_elimination_compare_lt is not sufficient.  Considering example
> given by comments just before the function:
> /* Tries to replace loop exit by one formulated in terms of a LT_EXPR
>    comparison with CAND.  NITER describes the number of iterations of
>    the loops.  If successful, the comparison in COMP_P is altered accordingly.
>
>    We aim to handle the following situation:
>
>    sometype *base, *p;
>    int a, b, i;
>
>    i = a;
>    p = p_0 = base + a;
>
>    do
>      {
>        bla (*p);
>        p++;
>        i++;
>      }
>    while (i < b);
>
>    Here, the number of iterations of the loop is (a + 1 > b) ? 0 : b - a - 1.
>    We aim to optimize this to
>
>    p = p_0 = base + a;
>    do
>      {
>        bla (*p);
>        p++;
>      }
>    while (p < p_0 - a + b);
>
>    This preserves the correctness, since the pointer arithmetics does not
>    overflow.  More precisely:
>
>    1) if a + 1 <= b, then p_0 - a + b is the final value of p, hence there is no
>       overflow in computing it or the values of p.
>    2) if a + 1 > b, then we need to verify that the expression p_0 - a does not
>       overflow.  To prove this, we use the fact that p_0 = base + a.  */
>
> The example seems wrong to me, because the statement only stands for
> unsigned case like ivopts-lt.c:
>
>
> #include "stdint.h"
>
> void
> f1 (char *p, uintptr_t i, uintptr_t n)
> {
>   p += i;
>   do
>     {
>       *p = '\0';
>       p += 1;
>       i++;
>     }
>   while (i < n);
> }
>
> If i/n are of signed type, the 2nd point for "a+1>b" scenario in the
> comment isn't right, because of below facts:
> a) niter of loop are calculated in unsigned type as in function
> number_of_iterations_lt.  So loop information for this case is like:
>
> Analyzing # of iterations of loop 1
>   exit condition [i_4(D) + 1, + , 1](no_overflow) < n_12(D)
>   bounds on difference of bases: -4294967295 ... 4294967294
>   result:
>     zero if i_4(D) >= n_12(D)
>     # of iterations ((unsigned int) n_12(D) - (unsigned int) i_4(D)) -
> 1, bounded by 4294967294
>   number of iterations ((unsigned int) n_12(D) - (unsigned int)
> i_4(D)) - 1; zero if i_4(D) >= n_12(D)
>
> b) The (unsigned int)n_12(D) could overflow, the check in
> iv_elimination_compare_lt doesn't take this fact into consideration.
>
> Tricky thing is GCC now doesn't eliminate "i < n" with point P for
> signed version of case.  The reason behind is again the may_be_zero
> information.  The original form of may_be_zero is like "i_4(D) + 1>
> n_12(D)", which is folded into "i_4(D) >= n_12(D)" because both
> i_4/n_12 are of signed type and don't wrap.  In the end, function
> iv_elimination_compare_lt only handles the original form of
> may_be_zero.
> So below case is the closest one to illustrate the problem:
>
> #include "stdint.h"
>
> void
> f1 (char *p, int i, int n)
> {
>   p += i;
>   do
>     {
>       *p = '\0';
>       p += 1;
>       i++;
>     }
>   while (i < n);
> }
>
> char arr[100] = "hello\n";
>
> int main(void)
> {
>   f1 (arr, 1, -1);
>
>   return 0;
> }
>
> If we build the original form of may_be_zero with change:
> Index: gcc/tree-ssa-loop-niter.c
> ===================================================================
> --- gcc/tree-ssa-loop-niter.c   (revision 211966)
> +++ gcc/tree-ssa-loop-niter.c   (working copy)
> @@ -1153,8 +1153,13 @@
>          condition.  */
>
>        if (mpz_sgn (bnds->below) < 0)
> +#if 0
>         niter->may_be_zero = fold_build2 (LT_EXPR, boolean_type_node,
>                                           iv1->base, iv0->base);
> +#else
> +       niter->may_be_zero = build2_loc (UNKNOWN_LOCATION, LT_EXPR,
> boolean_type_node,
> +                                         iv1->base, iv0->base);
> +#endif
>        niter->niter = delta;
>        niter->max = widest_int::from (wi::from_mpz (niter_type,
> bnds->up, false),
>                                      TYPE_SIGN (niter_type));
>
> The case will be mis-compiled.
>
> One the other handle, iv_elimination_compare_lt for loops with
> may_be_zero can be further improved to handle cases other than pointer
> candidate.  My original patch is intended to handle unsigned type
> candidates, but there are other cases.
>
> The problem here is a little complicated, especially if we want to
> improve iv elimination, I may mis-read the code somehow.  So any idea
> about this?
>

BTW, for below gimple dump before ivopt:
foo (int l)
{
  long long int sum;
  long long int j;
  int i;
  int _8;

  <bb 2>:

  <bb 3>:
  # i_1 = PHI <0(2), i_4(4)>
  # j_2 = PHI <0(2), j_5(4)>
  # sum_3 = PHI <0(2), sum_6(4)>
  i_4 = i_1 + 1;
  j_5 = j_2 + 1;
  sum_6 = sum_3 + j_5;
  if (i_4 < l_7(D))
    goto <bb 4>;
  else
    goto <bb 5>;

  <bb 4>:
  goto <bb 3>;

  <bb 5>:
  # sum_12 = PHI <sum_6(3)>
  _8 = (int) sum_12;
  return _8;

}

The loop information is like:

;;
;; Loop 1
;;  header 3, latch 4
;;  depth 1, outer 0
;;  nodes: 3 4
Processing loop 1
  single exit 3 -> 5, exit condition if (i_4 < l_7(D))


Analyzing # of iterations of loop 1
  exit condition [1, + , 1](no_overflow) < l_7(D)
  bounds on difference of bases: -2147483649 ... 2147483646
  result:
    zero if l_7(D) < 1
    # of iterations (unsigned int) l_7(D) + 4294967295, bounded by 2147483646
  number of iterations (unsigned int) l_7(D) + 4294967295; zero if l_7(D) < 1

Question is, is it safe to eliminate "i_4 < l_7(D)" as below?
Replacing exit test: if (i_4 < l_7(D))
foo (int l)
{
  long long int sum;
  long long int j;
  int i;
  int _8;
  long long int _10;
  unsigned int _11;
  unsigned long _13;
  unsigned long _14;
  unsigned int _15;

  <bb 2>:
  _11 = (unsigned int) l_7(D);
  _15 = _11 + 4294967295;
  _14 = (unsigned long) _15;
  _13 = _14 + 1;
  _10 = (long long int) _13;

  <bb 3>:
  # j_2 = PHI <0(2), j_5(4)>
  # sum_3 = PHI <0(2), sum_6(4)>
  j_5 = j_2 + 1;
  sum_6 = sum_3 + j_5;
  if (j_5 < _10)
    goto <bb 4>;
  else
    goto <bb 5>;

  <bb 4>:
  goto <bb 3>;

  <bb 5>:
  # sum_12 = PHI <sum_6(3)>
  _8 = (int) sum_12;
  return _8;

}

Thanks,
bin

References:
- Re: [PATCH]Enable elimination of IV use with unsigned type candidate
  - From: Bin.Cheng

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]