Bug 102316

Summary: Unexpected stringop-overflow Warnings on POWER CPU
Product: gcc Reporter: David Christensen <drc>
Component: tree-optimizationAssignee: Not yet assigned to anyone <unassigned>
Status: NEW ---    
Severity: normal CC: guihaoc, rguenth, sergiodj, xtkoba
Priority: P3 Keywords: diagnostic
Version: 11.2.1   
Target Milestone: ---   
Host: Target: powerpc64le
Build: Known to work:
Known to fail: Last reconfirmed: 2022-08-25 00:00:00
Bug Depends on:    
Bug Blocks: 88443    

Description David Christensen 2021-09-13 20:15:57 UTC
Building DPDK with gcc (GCC) 11.1.1 20210531 (Red Hat 11.1.1-3) on a POWER9 host and powerpc64le-linux-gnu-gcc (GCC) 11.2.1 20210802 (Advance-Toolchain 15.0-0) [ebcfb7a665c2] on an x86_64 cross-compile host, generates the warning:

In function ‘i40e_flow_parse_fdir_pattern’,
    inlined from ‘i40e_flow_parse_fdir_filter’ at ../drivers/net/i40e/i40e_flow.c:3274:8:
../drivers/net/i40e/i40e_flow.c:3052:69: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
 3052 |                                 filter->input.flow_ext.flexbytes[j] =
      |                                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
 3053 |                                         raw_spec->pattern[i];
      |                                         ~~~~~~~~~~~~~~~~~~~~
In file included from ../drivers/net/i40e/i40e_flow.c:25:
../drivers/net/i40e/i40e_flow.c: In function ‘i40e_flow_parse_fdir_filter’:
../drivers/net/i40e/i40e_ethdev.h:630:17: note: at offset 16 into destination object ‘flexbytes’ of size 16
  630 |         uint8_t flexbytes[RTE_ETH_FDIR_MAX_FLEXLEN];
      |                 ^~~~~~~~~

See https://bugs.dpdk.org/show_bug.cgi?id=743 for additional details on DPDK build failure.  

Running cvise to reduce the failing code yields the following simplified test failure:

#include <stdlib.h>

#define LEN 16
struct {
  char c[LEN]
} d;

int a = LEN;
char* b;

int p() {
  for (int i = 0; i < a; i++) {
    d.c[i] = b[i];
  }
  return 0;
}

int main () {
  int r = 0;
  b = malloc(sizeof(char) * (LEN + 1));
  r = p();
  return r;
}


$ gcc -O3 test.c
test.c:6:1: warning: no semicolon at end of struct or union
    6 | } d;
      | ^
test.c: In function 'p':
test.c:13:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
   13 |     d.c[i] = b[i];
      |     ~~~~~~~^~~~~~
test.c:5:8: note: at offset 16 into destination object 'c' of size 16
    5 |   char c[LEN]
      |        ^
test.c:13:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
   13 |     d.c[i] = b[i];
      |     ~~~~~~~^~~~~~
test.c:5:8: note: at offset 17 into destination object 'c' of size 16
    5 |   char c[LEN]
      |        ^
test.c:13:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
   13 |     d.c[i] = b[i];
      |     ~~~~~~~^~~~~~
test.c:5:8: note: at offset 18 into destination object 'c' of size 16
    5 |   char c[LEN]
      |        ^
test.c:13:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
   13 |     d.c[i] = b[i];
      |     ~~~~~~~^~~~~~
test.c:5:8: note: at offset 19 into destination object 'c' of size 16
    5 |   char c[LEN]
      |        ^
test.c:13:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
   13 |     d.c[i] = b[i];
      |     ~~~~~~~^~~~~~
test.c:5:8: note: at offset 20 into destination object 'c' of size 16
    5 |   char c[LEN]
      |        ^
test.c:13:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
   13 |     d.c[i] = b[i];
      |     ~~~~~~~^~~~~~
test.c:5:8: note: at offset 21 into destination object 'c' of size 16
    5 |   char c[LEN]
      |        ^
test.c:13:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
   13 |     d.c[i] = b[i];
      |     ~~~~~~~^~~~~~
test.c:5:8: note: at offset 22 into destination object 'c' of size 16
    5 |   char c[LEN]
      |        ^
test.c:13:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
   13 |     d.c[i] = b[i];
      |     ~~~~~~~^~~~~~
test.c:5:8: note: at offset 23 into destination object 'c' of size 16
    5 |   char c[LEN]
      |        ^
test.c:13:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
   13 |     d.c[i] = b[i];
      |     ~~~~~~~^~~~~~
test.c:5:8: note: at offset 24 into destination object 'c' of size 16
    5 |   char c[LEN]
      |        ^
test.c:13:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
   13 |     d.c[i] = b[i];
      |     ~~~~~~~^~~~~~
test.c:5:8: note: at offset 25 into destination object 'c' of size 16
    5 |   char c[LEN]
      |        ^
test.c:13:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
   13 |     d.c[i] = b[i];
      |     ~~~~~~~^~~~~~
test.c:5:8: note: at offset 26 into destination object 'c' of size 16
    5 |   char c[LEN]
      |        ^
test.c:13:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
   13 |     d.c[i] = b[i];
      |     ~~~~~~~^~~~~~
test.c:5:8: note: at offset 27 into destination object 'c' of size 16
    5 |   char c[LEN]
      |        ^
test.c:13:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
   13 |     d.c[i] = b[i];
      |     ~~~~~~~^~~~~~
test.c:5:8: note: at offset 28 into destination object 'c' of size 16
    5 |   char c[LEN]
      |        ^
test.c:13:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
   13 |     d.c[i] = b[i];
      |     ~~~~~~~^~~~~~
test.c:5:8: note: at offset 29 into destination object 'c' of size 16
    5 |   char c[LEN]
      |        ^
test.c:13:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
   13 |     d.c[i] = b[i];
      |     ~~~~~~~^~~~~~
test.c:5:8: note: at offset 30 into destination object 'c' of size 16
    5 |   char c[LEN]
      |        ^

Compiling both the original DPDK and simplified code with -O3 for POWER systems generates the given warnings, but compiling the code with -O2 for POWER systems does not generate the warning.

Compiling the simplified code with either -O3 or -O2 for x86_64 systems does not generate a warning.
Comment 1 Andrew Pinski 2021-09-13 20:22:07 UTC
Looks like it is unrolling ...
Comment 2 Tee KOBAYASHI 2021-09-14 05:31:31 UTC
A workaround would be to __builtin_unreachable(), as usual:

@@ -9,6 +9,7 @@
 char* b;
 
 int p() {
+  if (a > LEN) __builtin_unreachable();
   for (int i = 0; i < a; i++) {
     d.c[i] = b[i];
   }
Comment 3 Sergio Durigan Junior 2022-08-12 02:37:59 UTC
I can confirm this bug.  We're facing the problem when compiling NSS on Ubuntu Kinetic (development version) on ppc64el, because the build uses -O3.
Comment 4 HaoChen Gui 2022-08-25 07:48:04 UTC
#define LEN 4

struct {
  char c[LEN]
} d;

extern int a;
extern char* b;

int p() {
  for (int i = 0; i < a; i++) {
    d.c[i] = b[i];
  }
  return 0;
}

Above codes cause the same errors on x86. When setting the LEN to 8, it can be also reproduced on aarch64. It's a common problem.

The iteration number of reset loop after vectorization should not only decided by variable "a" but also by the length of array. If the len is 5 and vector size is 4, the reset loop should be only executed once. Currently iteration number only depends on variable "a". Then it is complete unrolled 3 times if vector size is 4. That causes the warning.

  <bb 17> [local count: 398179264]:
  # i_30 = PHI <i_36(18), tmp.9_40(20)>
  _32 = (sizetype) i_30;
  _33 = b.0_1 + _32;
  _34 = *_33;
  d.c[i_30] = _34;
  i_36 = i_30 + 1;
   if (i_36 < a.1_13)  // iterations depend on "a" only, the length of array is not take into consideration
    goto <bb 18>; [89.00%]
  else
    goto <bb 19>; [11.00%]
Comment 5 Richard Biener 2022-08-25 08:28:33 UTC
(In reply to HaoChen Gui from comment #4)
> #define LEN 4
> 
> struct {
>   char c[LEN]
> } d;
> 
> extern int a;
> extern char* b;
> 
> int p() {
>   for (int i = 0; i < a; i++) {
>     d.c[i] = b[i];
>   }
>   return 0;
> }
> 
> Above codes cause the same errors on x86. When setting the LEN to 8, it can
> be also reproduced on aarch64. It's a common problem.
> 
> The iteration number of reset loop after vectorization should not only
> decided by variable "a" but also by the length of array. If the len is 5 and
> vector size is 4, the reset loop should be only executed once. Currently
> iteration number only depends on variable "a". Then it is complete unrolled
> 3 times if vector size is 4. That causes the warning.
> 
>   <bb 17> [local count: 398179264]:
>   # i_30 = PHI <i_36(18), tmp.9_40(20)>
>   _32 = (sizetype) i_30;
>   _33 = b.0_1 + _32;
>   _34 = *_33;
>   d.c[i_30] = _34;
>   i_36 = i_30 + 1;
>    if (i_36 < a.1_13)  // iterations depend on "a" only, the length of array
> is not take into consideration
>     goto <bb 18>; [89.00%]
>   else
>     goto <bb 19>; [11.00%]

It does take this into account when unrolling.  The issue is - at least for
the vectorized epilogue - that when we set the iteration bound based on
the VF we do not factor in the iteration bound of the scalar loop when we
know the vector loop is iterating at least once.  That's something we could
improve.

The real issue is of course that the diagnostic code is too trigger-happy
and the unroll code is prone to leaving one not executable iteration (part).