void f(short*p){ p=(short*)__builtin_assume_aligned(p,64); short*q=p+256; for(;p!=q;++p,--q){ short t=*p;*p=*q;*q=t; } } compiled with -O3 -march=skylake, this gets vectorized. However, if I replace p!=q with p<q (much safer in case 256 was actually 255), it isn't vectorized anymore, apparently because the number of iterations cannot be determined. Replacing p!=q with p!=q&&(p+1)!=q (or using std::reverse) also breaks vectorization, but that's less surprising.
Probably some more elaborate handling in number_of_iterations_cond is required: /* We can handle the case when neither of the sides of the comparison is invariant, provided that the test is NE_EXPR. This rarely occurs in practice, but it is simple enough to manage. */ if (!integer_zerop (iv0->step) && !integer_zerop (iv1->step)) { tree step_type = POINTER_TYPE_P (type) ? sizetype : type; if (code != NE_EXPR) return false; iv0->step = fold_binary_to_constant (MINUS_EXPR, step_type, iv0->step, iv1->step); iv0->no_overflow = false; iv1->step = build_int_cst (step_type, 0); iv1->no_overflow = true; } I think this exit is premature and the following works for the testcase. I suppose exiting is still required but can be moved to a later point, or the helpers now fully handle the case of non-constant iv1 ... Vectorization still fails with this due to runtime aliasing so it probably exposes some wrong-code issue. CCing Bin who is now most familiar with the niter code. Index: gcc/tree-ssa-loop-niter.c =================================================================== --- gcc/tree-ssa-loop-niter.c (revision 249638) +++ gcc/tree-ssa-loop-niter.c (working copy) @@ -1674,14 +1674,14 @@ number_of_iterations_cond (struct loop * if (!integer_zerop (iv0->step) && !integer_zerop (iv1->step)) { tree step_type = POINTER_TYPE_P (type) ? sizetype : type; - if (code != NE_EXPR) - return false; - - iv0->step = fold_binary_to_constant (MINUS_EXPR, step_type, - iv0->step, iv1->step); - iv0->no_overflow = false; - iv1->step = build_int_cst (step_type, 0); - iv1->no_overflow = true; + if (code == NE_EXPR) + { + iv0->step = fold_binary_to_constant (MINUS_EXPR, step_type, + iv0->step, iv1->step); + iv0->no_overflow = false; + iv1->step = build_int_cst (step_type, 0); + iv1->no_overflow = true; + } } /* If the result of the comparison is a constant, the loop is weird. More
(In reply to Richard Biener from comment #1) > Probably some more elaborate handling in number_of_iterations_cond is > required: > > /* We can handle the case when neither of the sides of the comparison is > invariant, provided that the test is NE_EXPR. This rarely occurs in > practice, but it is simple enough to manage. */ > if (!integer_zerop (iv0->step) && !integer_zerop (iv1->step)) > { > tree step_type = POINTER_TYPE_P (type) ? sizetype : type; > if (code != NE_EXPR) > return false; > > iv0->step = fold_binary_to_constant (MINUS_EXPR, step_type, > iv0->step, iv1->step); > iv0->no_overflow = false; > iv1->step = build_int_cst (step_type, 0); > iv1->no_overflow = true; > } > > I think this exit is premature and the following works for the testcase. > I suppose exiting is still required but can be moved to a later point, > or the helpers now fully handle the case of non-constant iv1 ... > Vectorization still fails with this due to runtime aliasing so it > probably exposes some wrong-code issue. CCing Bin who is now most > familiar with the niter code. > > Index: gcc/tree-ssa-loop-niter.c > =================================================================== > --- gcc/tree-ssa-loop-niter.c (revision 249638) > +++ gcc/tree-ssa-loop-niter.c (working copy) > @@ -1674,14 +1674,14 @@ number_of_iterations_cond (struct loop * > if (!integer_zerop (iv0->step) && !integer_zerop (iv1->step)) > { > tree step_type = POINTER_TYPE_P (type) ? sizetype : type; > - if (code != NE_EXPR) > - return false; > - > - iv0->step = fold_binary_to_constant (MINUS_EXPR, step_type, > - iv0->step, iv1->step); > - iv0->no_overflow = false; > - iv1->step = build_int_cst (step_type, 0); > - iv1->no_overflow = true; > + if (code == NE_EXPR) > + { > + iv0->step = fold_binary_to_constant (MINUS_EXPR, step_type, > + iv0->step, iv1->step); > + iv0->no_overflow = false; > + iv1->step = build_int_cst (step_type, 0); > + iv1->no_overflow = true; > + } > } > > /* If the result of the comparison is a constant, the loop is weird. > More thanks for ccing, I will study the case.
(In reply to Richard Biener from comment #1) > Probably some more elaborate handling in number_of_iterations_cond is > required: > > /* We can handle the case when neither of the sides of the comparison is > invariant, provided that the test is NE_EXPR. This rarely occurs in > practice, but it is simple enough to manage. */ > if (!integer_zerop (iv0->step) && !integer_zerop (iv1->step)) > { > tree step_type = POINTER_TYPE_P (type) ? sizetype : type; > if (code != NE_EXPR) > return false; > > iv0->step = fold_binary_to_constant (MINUS_EXPR, step_type, > iv0->step, iv1->step); > iv0->no_overflow = false; > iv1->step = build_int_cst (step_type, 0); > iv1->no_overflow = true; > } > > I think this exit is premature and the following works for the testcase. > I suppose exiting is still required but can be moved to a later point, > or the helpers now fully handle the case of non-constant iv1 ... > Vectorization still fails with this due to runtime aliasing so it > probably exposes some wrong-code issue. CCing Bin who is now most > familiar with the niter code. > > Index: gcc/tree-ssa-loop-niter.c > =================================================================== > --- gcc/tree-ssa-loop-niter.c (revision 249638) > +++ gcc/tree-ssa-loop-niter.c (working copy) > @@ -1674,14 +1674,14 @@ number_of_iterations_cond (struct loop * > if (!integer_zerop (iv0->step) && !integer_zerop (iv1->step)) > { > tree step_type = POINTER_TYPE_P (type) ? sizetype : type; > - if (code != NE_EXPR) > - return false; > - > - iv0->step = fold_binary_to_constant (MINUS_EXPR, step_type, > - iv0->step, iv1->step); > - iv0->no_overflow = false; > - iv1->step = build_int_cst (step_type, 0); > - iv1->no_overflow = true; > + if (code == NE_EXPR) > + { > + iv0->step = fold_binary_to_constant (MINUS_EXPR, step_type, > + iv0->step, iv1->step); > + iv0->no_overflow = false; > + iv1->step = build_int_cst (step_type, 0); > + iv1->no_overflow = true; > + } No, at least we need to adjust for other code like LT_EXPR/LE_EXPR too. Following code can't handle comparison with both sides non-zero ivs. I guess it only handles NE_EXPR, otherwise it's possible to end up with wrong result because of wrapping behavior. Considering below test: unsigned int i = 0xfffffff0, j=0xfffffff8; for (; i < j; i++, j+=2) it only iterates for 4 times before j wrapping to 0. It's not equal to: unsigned int i = 0xfffffff0, j=0xfffffff8; for (; i < j; i--) The tricky part is to identify safe cases. I will try to improve this. Thanks.
Hmm, the function can only be vectorized with "-march=skylake"? So what requirement is needed to add a test case for this? Thanks.
(In reply to amker from comment #4) > Hmm, the function can only be vectorized with "-march=skylake"? Er, it also vectorizes without any -march on x86_64 (with shorter vectors).
(In reply to Marc Glisse from comment #5) > (In reply to amker from comment #4) > > Hmm, the function can only be vectorized with "-march=skylake"? > > Er, it also vectorizes without any -march on x86_64 (with shorter vectors). But below is necessary, right? /* { dg-require-effective-target vect_perm_short } */ Otherwise I get: pr81196.c:7:3: note: negative step and reversing not supported. pr81196.c:7:3: note: not falling back to elementwise accesses pr81196.c:7:3: note: not vectorized: relevant stmt not supported: _1 = *q_18; pr81196.c:7:3: note: bad operation or unsupported loop bound. pr81196.c:4:6: note: vectorized 0 loops in function. Thanks
(In reply to amker from comment #6) > But below is necessary, right? > > /* { dg-require-effective-target vect_perm_short } */ Yes. Cool, I didn't remember we had exactly the right test :-) Or you could try and change what it done inside the loop (or the element type) so it doesn't require a permutation of shorts, but that doesn't seem worth the trouble, vect_perm_short targets are not that rare.
Author: amker Date: Thu Jun 29 10:41:28 2017 New Revision: 249778 URL: https://gcc.gnu.org/viewcvs?rev=249778&root=gcc&view=rev Log: PR tree-optimization/81196 * tree-ssa-loop-niter.c (number_of_iterations_cond): Handle loop exit condition comparing two IVs. gcc/testsuite * gcc.dg/vect/pr81196.c: New. Added: trunk/gcc/testsuite/gcc.dg/vect/pr81196.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-loop-niter.c
Should be fixed.