The following testcase has a bogus warning on trunk back to at least gcc 11. bergner@ltcden2-lp1:LTC193379$ cat bug.c char dst[16]; long n = 16; void foo (char *src) { for (long i = 0; i < n; i++) dst[i] = src[i]; } bergner@ltcden2-lp1:LTC193379$ /opt/gcc-nightly/trunk/bin/gcc -S -O3 -mcpu=power8 bug.c bug.c: In function ‘foo’: bug.c:8:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=] 8 | dst[i] = src[i]; | ~~~~~~~^~~~~~~~ bug.c:1:6: note: at offset 16 into destination object ‘dst’ of size 16 1 | char dst[16]; | ^~~ bug.c:8:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=] 8 | dst[i] = src[i]; | ~~~~~~~^~~~~~~~ bug.c:1:6: note: at offset 17 into destination object ‘dst’ of size 16 1 | char dst[16]; | ^~~
IIRC there is a known issue with Wstringop-overflow and the vectorizer ...
We are diagnosing the scalar store in the epilogue of the vectorized "loop" where we somehow optimized the vector loop to terminate immediately (based on dst[] size I guess) and also peeled the epilogue but failed to eliminate that completely (but then -Wstringop-overflow correctly figures the access would be out-of-bounds). A bit reduced (interestingly when avoiding versioning for aliasing we manage to avoid the diagnostic because of the way we re-use the epilog when n < 16 char dst[16]; void foo (char *src, long n) { for (long i = 0; i < n; i++) dst[i] = src[i]; } in the end this is a missed optimization. We do know an upper bound for the loop so we should have disabled epilog peeling based on that. The peeling code assumes the vector loop can be skipped (niter < vf) only when we either vectorize the epilogue or do not version the loop while the code computing wheter we need peeling takes into account the versioning cost model threshold only.
I do have a patch to avoid the epilog, but that doesn't help when adjusting the testcase to char dst[17]; void foo (char *src, long n) { for (long i = 0; i < n; i++) dst[i] = src[i]; } because then we still fail to constrain the epilog number of iterations (the different cases of flow are now quite complicated, the pending early exit vectorization patches will complicate it more). I'll see what to do _after_ that work got in. The following helps the dst[16] case, ideally we'd refactor that a bit according to the comment. diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 7a3db5f098b..a4dd2caa400 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -1260,7 +1260,11 @@ vect_need_peeling_or_partial_vectors_p (loop_vec_info loop_vinfo) the epilogue is unnecessary. */ && (!LOOP_REQUIRES_VERSIONING (loop_vinfo) || ((unsigned HOST_WIDE_INT) max_niter - > (th / const_vf) * const_vf)))) + /* We'd like to use LOOP_VINFO_VERSIONING_THRESHOLD + but that's only computed later based on our result. + The following is the most conservative approximation. */ + > (std::max ((unsigned HOST_WIDE_INT) th, + const_vf) / const_vf) * const_vf)))) return true; return false; it's interesting that we don't seem to adjust the upper bound of the niters for the epilog at all. The following cures that as well: diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc index bcd90a331f5..07a30b7ee98 100644 --- a/gcc/tree-vect-loop-manip.cc +++ b/gcc/tree-vect-loop-manip.cc @@ -3193,6 +3193,19 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, bb_before_epilog->count = single_pred_edge (bb_before_epilog)->count (); bb_before_epilog = loop_preheader_edge (epilog)->src; } + else + { + /* When we do not have a loop-around edge to the epilog we know + the vector loop covered at least VF scalar iterations. Update + any known upper bound with this knowledge. */ + if (loop->any_upper_bound) + epilog->nb_iterations_upper_bound -= constant_lower_bound (vf); + if (loop->any_likely_upper_bound) + epilog->nb_iterations_likely_upper_bound -= constant_lower_bound (vf); + if (loop->any_estimate) + epilog->nb_iterations_estimate -= constant_lower_bound (vf); + } + /* If loop is peeled for non-zero constant times, now niters refers to orig_niters - prolog_peeling, it won't overflow even the orig_niters overflows. */
Drive by comments as it's been a while since I looked at this. I'm also surprised we didn't adjust the bounds. But why do we only subtract VF? Like you say, if there's no loop around edge, can't we guarantee the epilogue will only need to iterate at most VF-1? This is assuming we didn't take an early exit, if we do then we can't assume anything as the iterations 'reset'.
On Fri, 15 Dec 2023, avieira at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113026 > > --- Comment #4 from avieira at gcc dot gnu.org --- > Drive by comments as it's been a while since I looked at this. I'm also > surprised we didn't adjust the bounds. But why do we only subtract VF? Like you > say, if there's no loop around edge, can't we guarantee the epilogue will only > need to iterate at most VF-1? This is assuming we didn't take an early exit, > if we do then we can't assume anything as the iterations 'reset'. Subtracting can bring down epilogue iterations max to 1 while yes, we should also apply a max of VF-1 but in addition to the subtraction.
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:b3cc5a1efead520bc977b4ba51f1328d01b3e516 commit r14-7003-gb3cc5a1efead520bc977b4ba51f1328d01b3e516 Author: Richard Biener <rguenther@suse.de> Date: Fri Dec 15 10:32:29 2023 +0100 tree-optimization/113026 - avoid vector epilog in more cases The following avoids creating a niter peeling epilog more consistently, matching what peeling later uses for the skip_vector condition, in particular when versioning is required which then also ensures the vector loop is entered unless the epilog is vectorized. This should ideally match LOOP_VINFO_VERSIONING_THRESHOLD which is only computed later, some refactoring could make that better matching. The patch also makes sure to adjust the upper bound of the epilogues when we do not have a skip edge around the vector loop. PR tree-optimization/113026 * tree-vect-loop.cc (vect_need_peeling_or_partial_vectors_p): Avoid an epilog in more cases. * tree-vect-loop-manip.cc (vect_do_peeling): Adjust the epilogues niter upper bounds and estimates. * gcc.dg/torture/pr113026-1.c: New testcase. * gcc.dg/torture/pr113026-2.c: Likewise.
Fixed.
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:c22cf7a7a77bf9245bd0790b21695440208c3aa5 commit r14-7039-gc22cf7a7a77bf9245bd0790b21695440208c3aa5 Author: Richard Biener <rguenther@suse.de> Date: Tue Jan 9 11:49:50 2024 +0100 tree-optimization/113026 - fix vector epilogue maximum iter bound The late amendment with a limit based on VF was redundant and wrong for peeled early exits. The following moves the adjustment done when we don't have a skip edge down to the place where the already existing VF based max iter check is done and removes the amendment. PR tree-optimization/113026 * tree-vect-loop-manip.cc (vect_do_peeling): Remove redundant and wrong niter bound setting. Move niter bound adjustment down.