Bug 113026

Summary: Bogus -Wstringop-overflow warning on simple memcpy type loop
Product: gcc Reporter: Peter Bergner <bergner>
Component: tree-optimizationAssignee: Richard Biener <rguenth>
Status: RESOLVED FIXED    
Severity: normal CC: avieira, dje, gabravier, linkw, segher
Priority: P3 Keywords: diagnostic, missed-optimization
Version: 14.0   
Target Milestone: 14.0   
See Also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102462
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113524
Host: Target:
Build: Known to work:
Known to fail: Last reconfirmed: 2023-12-15 00:00:00
Bug Depends on:    
Bug Blocks: 88443    

Description Peter Bergner 2023-12-14 21:42:21 UTC
The following testcase has a bogus warning on trunk back to at least gcc 11.

bergner@ltcden2-lp1:LTC193379$ cat bug.c 
char dst[16];
long n = 16;

void
foo (char *src)
{
  for (long i = 0; i < n; i++)
    dst[i] = src[i];
}

bergner@ltcden2-lp1:LTC193379$ /opt/gcc-nightly/trunk/bin/gcc -S -O3 -mcpu=power8 bug.c 
bug.c: In function ‘foo’:
bug.c:8:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
    8 |     dst[i] = src[i];
      |     ~~~~~~~^~~~~~~~
bug.c:1:6: note: at offset 16 into destination object ‘dst’ of size 16
    1 | char dst[16];
      |      ^~~
bug.c:8:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
    8 |     dst[i] = src[i];
      |     ~~~~~~~^~~~~~~~
bug.c:1:6: note: at offset 17 into destination object ‘dst’ of size 16
    1 | char dst[16];
      |      ^~~
Comment 1 Andrew Pinski 2023-12-14 21:47:13 UTC
IIRC there is a known issue with Wstringop-overflow and the vectorizer ...
Comment 2 Richard Biener 2023-12-15 09:24:57 UTC
We are diagnosing the scalar store in the epilogue of the vectorized "loop"
where we somehow optimized the vector loop to terminate immediately (based
on dst[] size I guess) and also peeled the epilogue but failed to eliminate
that completely (but then -Wstringop-overflow correctly figures the access
would be out-of-bounds).

A bit reduced (interestingly when avoiding versioning for aliasing
we manage to avoid the diagnostic because of the way we re-use the
epilog when n < 16

char dst[16];
void
foo (char *src, long n)
{
  for (long i = 0; i < n; i++)
    dst[i] = src[i];
}

in the end this is a missed optimization.  We do know an upper bound
for the loop so we should have disabled epilog peeling based on that.

The peeling code assumes the vector loop can be skipped (niter < vf)
only when we either vectorize the epilogue or do not version the loop
while the code computing wheter we need peeling takes into account the
versioning cost model threshold only.
Comment 3 Richard Biener 2023-12-15 10:20:03 UTC
I do have a patch to avoid the epilog, but that doesn't help when adjusting the
testcase to

char dst[17];
void
foo (char *src, long n)
{
  for (long i = 0; i < n; i++)
    dst[i] = src[i];
}

because then we still fail to constrain the epilog number of iterations
(the different cases of flow are now quite complicated, the pending
early exit vectorization patches will complicate it more).

I'll see what to do _after_ that work got in.

The following helps the dst[16] case, ideally we'd refactor that a bit
according to the comment.

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 7a3db5f098b..a4dd2caa400 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1260,7 +1260,11 @@ vect_need_peeling_or_partial_vectors_p (loop_vec_info loop_vinfo)
             the epilogue is unnecessary.  */
          && (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
              || ((unsigned HOST_WIDE_INT) max_niter
-                 > (th / const_vf) * const_vf))))
+                 /* We'd like to use LOOP_VINFO_VERSIONING_THRESHOLD
+                    but that's only computed later based on our result.
+                    The following is the most conservative approximation.  */
+                 > (std::max ((unsigned HOST_WIDE_INT) th,
+                              const_vf) / const_vf) * const_vf))))
     return true;
 
   return false;

it's interesting that we don't seem to adjust the upper bound of the niters
for the epilog at all.  The following cures that as well:

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index bcd90a331f5..07a30b7ee98 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -3193,6 +3193,19 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
            bb_before_epilog->count = single_pred_edge (bb_before_epilog)->count ();
          bb_before_epilog = loop_preheader_edge (epilog)->src;
        }
+      else
+       {
+         /* When we do not have a loop-around edge to the epilog we know
+            the vector loop covered at least VF scalar iterations.  Update
+            any known upper bound with this knowledge.  */
+         if (loop->any_upper_bound)
+           epilog->nb_iterations_upper_bound -= constant_lower_bound (vf);
+         if (loop->any_likely_upper_bound)
+           epilog->nb_iterations_likely_upper_bound -= constant_lower_bound (vf);
+         if (loop->any_estimate)
+           epilog->nb_iterations_estimate -= constant_lower_bound (vf);
+       }
+
       /* If loop is peeled for non-zero constant times, now niters refers to
         orig_niters - prolog_peeling, it won't overflow even the orig_niters
         overflows.  */
Comment 4 avieira 2023-12-15 13:10:19 UTC
Drive by comments as it's been a while since I looked at this. I'm also surprised we didn't adjust the bounds. But why do we only subtract VF? Like you say, if there's no loop around edge, can't we guarantee the epilogue will only need to iterate at most VF-1?  This is assuming we didn't take an early exit, if we do then we can't assume anything as the iterations 'reset'.
Comment 5 rguenther@suse.de 2023-12-18 07:14:29 UTC
On Fri, 15 Dec 2023, avieira at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113026
> 
> --- Comment #4 from avieira at gcc dot gnu.org ---
> Drive by comments as it's been a while since I looked at this. I'm also
> surprised we didn't adjust the bounds. But why do we only subtract VF? Like you
> say, if there's no loop around edge, can't we guarantee the epilogue will only
> need to iterate at most VF-1?  This is assuming we didn't take an early exit,
> if we do then we can't assume anything as the iterations 'reset'.

Subtracting can bring down epilogue iterations max to 1 while yes,
we should also apply a max of VF-1 but in addition to the subtraction.
Comment 6 GCC Commits 2024-01-08 13:52:35 UTC
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:b3cc5a1efead520bc977b4ba51f1328d01b3e516

commit r14-7003-gb3cc5a1efead520bc977b4ba51f1328d01b3e516
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Dec 15 10:32:29 2023 +0100

    tree-optimization/113026 - avoid vector epilog in more cases
    
    The following avoids creating a niter peeling epilog more consistently,
    matching what peeling later uses for the skip_vector condition, in
    particular when versioning is required which then also ensures the
    vector loop is entered unless the epilog is vectorized.  This should
    ideally match LOOP_VINFO_VERSIONING_THRESHOLD which is only computed
    later, some refactoring could make that better matching.
    
    The patch also makes sure to adjust the upper bound of the epilogues
    when we do not have a skip edge around the vector loop.
    
            PR tree-optimization/113026
            * tree-vect-loop.cc (vect_need_peeling_or_partial_vectors_p):
            Avoid an epilog in more cases.
            * tree-vect-loop-manip.cc (vect_do_peeling): Adjust the
            epilogues niter upper bounds and estimates.
    
            * gcc.dg/torture/pr113026-1.c: New testcase.
            * gcc.dg/torture/pr113026-2.c: Likewise.
Comment 7 Richard Biener 2024-01-08 13:54:27 UTC
Fixed.
Comment 8 GCC Commits 2024-01-09 12:37:08 UTC
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:c22cf7a7a77bf9245bd0790b21695440208c3aa5

commit r14-7039-gc22cf7a7a77bf9245bd0790b21695440208c3aa5
Author: Richard Biener <rguenther@suse.de>
Date:   Tue Jan 9 11:49:50 2024 +0100

    tree-optimization/113026 - fix vector epilogue maximum iter bound
    
    The late amendment with a limit based on VF was redundant and wrong
    for peeled early exits.  The following moves the adjustment done
    when we don't have a skip edge down to the place where the already
    existing VF based max iter check is done and removes the amendment.
    
            PR tree-optimization/113026
            * tree-vect-loop-manip.cc (vect_do_peeling): Remove
            redundant and wrong niter bound setting.  Move niter
            bound adjustment down.