Bug 81196 - Number of iterations found for p!=q but not for p<q
Summary: Number of iterations found for p!=q but not for p<q
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 8.0
: P3 normal
Target Milestone: ---
Assignee: bin cheng
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2017-06-24 09:28 UTC by Marc Glisse
Modified: 2017-07-10 11:53 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2017-06-26 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marc Glisse 2017-06-24 09:28:45 UTC
void f(short*p){
  p=(short*)__builtin_assume_aligned(p,64);
  short*q=p+256;
  for(;p!=q;++p,--q){
    short t=*p;*p=*q;*q=t;
  }
}

compiled with -O3 -march=skylake, this gets vectorized. However, if I replace p!=q with p<q (much safer in case 256 was actually 255), it isn't vectorized anymore, apparently because the number of iterations cannot be determined. Replacing p!=q with p!=q&&(p+1)!=q (or using std::reverse) also breaks vectorization, but that's less surprising.
Comment 1 Richard Biener 2017-06-26 09:28:16 UTC
Probably some more elaborate handling in number_of_iterations_cond is required:

  /* We can handle the case when neither of the sides of the comparison is
     invariant, provided that the test is NE_EXPR.  This rarely occurs in
     practice, but it is simple enough to manage.  */
  if (!integer_zerop (iv0->step) && !integer_zerop (iv1->step))
    {
      tree step_type = POINTER_TYPE_P (type) ? sizetype : type;
      if (code != NE_EXPR)
        return false;

      iv0->step = fold_binary_to_constant (MINUS_EXPR, step_type,
                                           iv0->step, iv1->step);
      iv0->no_overflow = false;
      iv1->step = build_int_cst (step_type, 0);
      iv1->no_overflow = true;
    }

I think this exit is premature and the following works for the testcase.
I suppose exiting is still required but can be moved to a later point,
or the helpers now fully handle the case of non-constant iv1 ...
Vectorization still fails with this due to runtime aliasing so it
probably exposes some wrong-code issue.  CCing Bin who is now most
familiar with the niter code.

Index: gcc/tree-ssa-loop-niter.c
===================================================================
--- gcc/tree-ssa-loop-niter.c   (revision 249638)
+++ gcc/tree-ssa-loop-niter.c   (working copy)
@@ -1674,14 +1674,14 @@ number_of_iterations_cond (struct loop *
   if (!integer_zerop (iv0->step) && !integer_zerop (iv1->step))
     {
       tree step_type = POINTER_TYPE_P (type) ? sizetype : type;
-      if (code != NE_EXPR)
-       return false;
-
-      iv0->step = fold_binary_to_constant (MINUS_EXPR, step_type,
-                                          iv0->step, iv1->step);
-      iv0->no_overflow = false;
-      iv1->step = build_int_cst (step_type, 0);
-      iv1->no_overflow = true;
+      if (code == NE_EXPR)
+       {
+         iv0->step = fold_binary_to_constant (MINUS_EXPR, step_type,
+                                              iv0->step, iv1->step);
+         iv0->no_overflow = false;
+         iv1->step = build_int_cst (step_type, 0);
+         iv1->no_overflow = true;
+       }
     }
 
   /* If the result of the comparison is a constant,  the loop is weird.  More
Comment 2 bin cheng 2017-06-26 11:34:07 UTC
(In reply to Richard Biener from comment #1)
> Probably some more elaborate handling in number_of_iterations_cond is
> required:
> 
>   /* We can handle the case when neither of the sides of the comparison is
>      invariant, provided that the test is NE_EXPR.  This rarely occurs in
>      practice, but it is simple enough to manage.  */
>   if (!integer_zerop (iv0->step) && !integer_zerop (iv1->step))
>     {
>       tree step_type = POINTER_TYPE_P (type) ? sizetype : type;
>       if (code != NE_EXPR)
>         return false;
> 
>       iv0->step = fold_binary_to_constant (MINUS_EXPR, step_type,
>                                            iv0->step, iv1->step);
>       iv0->no_overflow = false;
>       iv1->step = build_int_cst (step_type, 0);
>       iv1->no_overflow = true;
>     }
> 
> I think this exit is premature and the following works for the testcase.
> I suppose exiting is still required but can be moved to a later point,
> or the helpers now fully handle the case of non-constant iv1 ...
> Vectorization still fails with this due to runtime aliasing so it
> probably exposes some wrong-code issue.  CCing Bin who is now most
> familiar with the niter code.
> 
> Index: gcc/tree-ssa-loop-niter.c
> ===================================================================
> --- gcc/tree-ssa-loop-niter.c   (revision 249638)
> +++ gcc/tree-ssa-loop-niter.c   (working copy)
> @@ -1674,14 +1674,14 @@ number_of_iterations_cond (struct loop *
>    if (!integer_zerop (iv0->step) && !integer_zerop (iv1->step))
>      {
>        tree step_type = POINTER_TYPE_P (type) ? sizetype : type;
> -      if (code != NE_EXPR)
> -       return false;
> -
> -      iv0->step = fold_binary_to_constant (MINUS_EXPR, step_type,
> -                                          iv0->step, iv1->step);
> -      iv0->no_overflow = false;
> -      iv1->step = build_int_cst (step_type, 0);
> -      iv1->no_overflow = true;
> +      if (code == NE_EXPR)
> +       {
> +         iv0->step = fold_binary_to_constant (MINUS_EXPR, step_type,
> +                                              iv0->step, iv1->step);
> +         iv0->no_overflow = false;
> +         iv1->step = build_int_cst (step_type, 0);
> +         iv1->no_overflow = true;
> +       }
>      }
>  
>    /* If the result of the comparison is a constant,  the loop is weird. 
> More

thanks for ccing, I will study the case.
Comment 3 bin cheng 2017-06-26 12:08:59 UTC
(In reply to Richard Biener from comment #1)
> Probably some more elaborate handling in number_of_iterations_cond is
> required:
> 
>   /* We can handle the case when neither of the sides of the comparison is
>      invariant, provided that the test is NE_EXPR.  This rarely occurs in
>      practice, but it is simple enough to manage.  */
>   if (!integer_zerop (iv0->step) && !integer_zerop (iv1->step))
>     {
>       tree step_type = POINTER_TYPE_P (type) ? sizetype : type;
>       if (code != NE_EXPR)
>         return false;
> 
>       iv0->step = fold_binary_to_constant (MINUS_EXPR, step_type,
>                                            iv0->step, iv1->step);
>       iv0->no_overflow = false;
>       iv1->step = build_int_cst (step_type, 0);
>       iv1->no_overflow = true;
>     }
> 
> I think this exit is premature and the following works for the testcase.
> I suppose exiting is still required but can be moved to a later point,
> or the helpers now fully handle the case of non-constant iv1 ...
> Vectorization still fails with this due to runtime aliasing so it
> probably exposes some wrong-code issue.  CCing Bin who is now most
> familiar with the niter code.
> 
> Index: gcc/tree-ssa-loop-niter.c
> ===================================================================
> --- gcc/tree-ssa-loop-niter.c   (revision 249638)
> +++ gcc/tree-ssa-loop-niter.c   (working copy)
> @@ -1674,14 +1674,14 @@ number_of_iterations_cond (struct loop *
>    if (!integer_zerop (iv0->step) && !integer_zerop (iv1->step))
>      {
>        tree step_type = POINTER_TYPE_P (type) ? sizetype : type;
> -      if (code != NE_EXPR)
> -       return false;
> -
> -      iv0->step = fold_binary_to_constant (MINUS_EXPR, step_type,
> -                                          iv0->step, iv1->step);
> -      iv0->no_overflow = false;
> -      iv1->step = build_int_cst (step_type, 0);
> -      iv1->no_overflow = true;
> +      if (code == NE_EXPR)
> +       {
> +         iv0->step = fold_binary_to_constant (MINUS_EXPR, step_type,
> +                                              iv0->step, iv1->step);
> +         iv0->no_overflow = false;
> +         iv1->step = build_int_cst (step_type, 0);
> +         iv1->no_overflow = true;
> +       }
No, at least we need to adjust for other code like LT_EXPR/LE_EXPR too.  Following code can't handle comparison with both sides non-zero ivs.
I guess it only handles NE_EXPR, otherwise it's possible to end up with wrong result because of wrapping behavior.  Considering below test:

unsigned int i = 0xfffffff0, j=0xfffffff8;
for (; i < j; i++, j+=2)
it only iterates for 4 times before j wrapping to 0.  It's not equal to:
unsigned int i = 0xfffffff0, j=0xfffffff8;
for (; i < j; i--)

The tricky part is to identify safe cases.  I will try to improve this.

Thanks.
Comment 4 bin cheng 2017-06-26 14:38:03 UTC
Hmm, the function can only be vectorized with "-march=skylake"?  So what requirement is needed to add a test case for this?

Thanks.
Comment 5 Marc Glisse 2017-06-26 17:57:03 UTC
(In reply to amker from comment #4)
> Hmm, the function can only be vectorized with "-march=skylake"?

Er, it also vectorizes without any -march on x86_64 (with shorter vectors).
Comment 6 bin cheng 2017-06-27 07:52:33 UTC
(In reply to Marc Glisse from comment #5)
> (In reply to amker from comment #4)
> > Hmm, the function can only be vectorized with "-march=skylake"?
> 
> Er, it also vectorizes without any -march on x86_64 (with shorter vectors).

But below is necessary, right?

  /* { dg-require-effective-target vect_perm_short } */

Otherwise I get:

pr81196.c:7:3: note: negative step and reversing not supported.
pr81196.c:7:3: note: not falling back to elementwise accesses
pr81196.c:7:3: note: not vectorized: relevant stmt not supported: _1 = *q_18;
pr81196.c:7:3: note: bad operation or unsupported loop bound.
pr81196.c:4:6: note: vectorized 0 loops in function.

Thanks
Comment 7 Marc Glisse 2017-06-27 07:57:55 UTC
(In reply to amker from comment #6)
> But below is necessary, right?
> 
>   /* { dg-require-effective-target vect_perm_short } */

Yes. Cool, I didn't remember we had exactly the right test :-)

Or you could try and change what it done inside the loop (or the element type) so it doesn't require a permutation of shorts, but that doesn't seem worth the trouble, vect_perm_short targets are not that rare.
Comment 8 bin cheng 2017-06-29 10:42:00 UTC
Author: amker
Date: Thu Jun 29 10:41:28 2017
New Revision: 249778

URL: https://gcc.gnu.org/viewcvs?rev=249778&root=gcc&view=rev
Log:
	PR tree-optimization/81196
	* tree-ssa-loop-niter.c (number_of_iterations_cond): Handle loop
	exit condition comparing two IVs.

	gcc/testsuite
	* gcc.dg/vect/pr81196.c: New.

Added:
    trunk/gcc/testsuite/gcc.dg/vect/pr81196.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-loop-niter.c
Comment 9 bin cheng 2017-07-10 11:53:45 UTC
Should be fixed.