Building DPDK with gcc (GCC) 11.1.1 20210531 (Red Hat 11.1.1-3) on a POWER9 host and powerpc64le-linux-gnu-gcc (GCC) 11.2.1 20210802 (Advance-Toolchain 15.0-0) [ebcfb7a665c2] on an x86_64 cross-compile host, generates the warning: In function ‘i40e_flow_parse_fdir_pattern’, inlined from ‘i40e_flow_parse_fdir_filter’ at ../drivers/net/i40e/i40e_flow.c:3274:8: ../drivers/net/i40e/i40e_flow.c:3052:69: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=] 3052 | filter->input.flow_ext.flexbytes[j] = | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^ 3053 | raw_spec->pattern[i]; | ~~~~~~~~~~~~~~~~~~~~ In file included from ../drivers/net/i40e/i40e_flow.c:25: ../drivers/net/i40e/i40e_flow.c: In function ‘i40e_flow_parse_fdir_filter’: ../drivers/net/i40e/i40e_ethdev.h:630:17: note: at offset 16 into destination object ‘flexbytes’ of size 16 630 | uint8_t flexbytes[RTE_ETH_FDIR_MAX_FLEXLEN]; | ^~~~~~~~~ See https://bugs.dpdk.org/show_bug.cgi?id=743 for additional details on DPDK build failure. Running cvise to reduce the failing code yields the following simplified test failure: #include <stdlib.h> #define LEN 16 struct { char c[LEN] } d; int a = LEN; char* b; int p() { for (int i = 0; i < a; i++) { d.c[i] = b[i]; } return 0; } int main () { int r = 0; b = malloc(sizeof(char) * (LEN + 1)); r = p(); return r; } $ gcc -O3 test.c test.c:6:1: warning: no semicolon at end of struct or union 6 | } d; | ^ test.c: In function 'p': test.c:13:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=] 13 | d.c[i] = b[i]; | ~~~~~~~^~~~~~ test.c:5:8: note: at offset 16 into destination object 'c' of size 16 5 | char c[LEN] | ^ test.c:13:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=] 13 | d.c[i] = b[i]; | ~~~~~~~^~~~~~ test.c:5:8: note: at offset 17 into destination object 'c' of size 16 5 | char c[LEN] | ^ test.c:13:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=] 13 | d.c[i] = b[i]; | ~~~~~~~^~~~~~ test.c:5:8: note: at offset 18 into destination object 'c' of size 16 5 | char c[LEN] | ^ test.c:13:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=] 13 | d.c[i] = b[i]; | ~~~~~~~^~~~~~ test.c:5:8: note: at offset 19 into destination object 'c' of size 16 5 | char c[LEN] | ^ test.c:13:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=] 13 | d.c[i] = b[i]; | ~~~~~~~^~~~~~ test.c:5:8: note: at offset 20 into destination object 'c' of size 16 5 | char c[LEN] | ^ test.c:13:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=] 13 | d.c[i] = b[i]; | ~~~~~~~^~~~~~ test.c:5:8: note: at offset 21 into destination object 'c' of size 16 5 | char c[LEN] | ^ test.c:13:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=] 13 | d.c[i] = b[i]; | ~~~~~~~^~~~~~ test.c:5:8: note: at offset 22 into destination object 'c' of size 16 5 | char c[LEN] | ^ test.c:13:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=] 13 | d.c[i] = b[i]; | ~~~~~~~^~~~~~ test.c:5:8: note: at offset 23 into destination object 'c' of size 16 5 | char c[LEN] | ^ test.c:13:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=] 13 | d.c[i] = b[i]; | ~~~~~~~^~~~~~ test.c:5:8: note: at offset 24 into destination object 'c' of size 16 5 | char c[LEN] | ^ test.c:13:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=] 13 | d.c[i] = b[i]; | ~~~~~~~^~~~~~ test.c:5:8: note: at offset 25 into destination object 'c' of size 16 5 | char c[LEN] | ^ test.c:13:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=] 13 | d.c[i] = b[i]; | ~~~~~~~^~~~~~ test.c:5:8: note: at offset 26 into destination object 'c' of size 16 5 | char c[LEN] | ^ test.c:13:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=] 13 | d.c[i] = b[i]; | ~~~~~~~^~~~~~ test.c:5:8: note: at offset 27 into destination object 'c' of size 16 5 | char c[LEN] | ^ test.c:13:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=] 13 | d.c[i] = b[i]; | ~~~~~~~^~~~~~ test.c:5:8: note: at offset 28 into destination object 'c' of size 16 5 | char c[LEN] | ^ test.c:13:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=] 13 | d.c[i] = b[i]; | ~~~~~~~^~~~~~ test.c:5:8: note: at offset 29 into destination object 'c' of size 16 5 | char c[LEN] | ^ test.c:13:12: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=] 13 | d.c[i] = b[i]; | ~~~~~~~^~~~~~ test.c:5:8: note: at offset 30 into destination object 'c' of size 16 5 | char c[LEN] | ^ Compiling both the original DPDK and simplified code with -O3 for POWER systems generates the given warnings, but compiling the code with -O2 for POWER systems does not generate the warning. Compiling the simplified code with either -O3 or -O2 for x86_64 systems does not generate a warning.
Looks like it is unrolling ...
A workaround would be to __builtin_unreachable(), as usual: @@ -9,6 +9,7 @@ char* b; int p() { + if (a > LEN) __builtin_unreachable(); for (int i = 0; i < a; i++) { d.c[i] = b[i]; }
I can confirm this bug. We're facing the problem when compiling NSS on Ubuntu Kinetic (development version) on ppc64el, because the build uses -O3.
#define LEN 4 struct { char c[LEN] } d; extern int a; extern char* b; int p() { for (int i = 0; i < a; i++) { d.c[i] = b[i]; } return 0; } Above codes cause the same errors on x86. When setting the LEN to 8, it can be also reproduced on aarch64. It's a common problem. The iteration number of reset loop after vectorization should not only decided by variable "a" but also by the length of array. If the len is 5 and vector size is 4, the reset loop should be only executed once. Currently iteration number only depends on variable "a". Then it is complete unrolled 3 times if vector size is 4. That causes the warning. <bb 17> [local count: 398179264]: # i_30 = PHI <i_36(18), tmp.9_40(20)> _32 = (sizetype) i_30; _33 = b.0_1 + _32; _34 = *_33; d.c[i_30] = _34; i_36 = i_30 + 1; if (i_36 < a.1_13) // iterations depend on "a" only, the length of array is not take into consideration goto <bb 18>; [89.00%] else goto <bb 19>; [11.00%]
(In reply to HaoChen Gui from comment #4) > #define LEN 4 > > struct { > char c[LEN] > } d; > > extern int a; > extern char* b; > > int p() { > for (int i = 0; i < a; i++) { > d.c[i] = b[i]; > } > return 0; > } > > Above codes cause the same errors on x86. When setting the LEN to 8, it can > be also reproduced on aarch64. It's a common problem. > > The iteration number of reset loop after vectorization should not only > decided by variable "a" but also by the length of array. If the len is 5 and > vector size is 4, the reset loop should be only executed once. Currently > iteration number only depends on variable "a". Then it is complete unrolled > 3 times if vector size is 4. That causes the warning. > > <bb 17> [local count: 398179264]: > # i_30 = PHI <i_36(18), tmp.9_40(20)> > _32 = (sizetype) i_30; > _33 = b.0_1 + _32; > _34 = *_33; > d.c[i_30] = _34; > i_36 = i_30 + 1; > if (i_36 < a.1_13) // iterations depend on "a" only, the length of array > is not take into consideration > goto <bb 18>; [89.00%] > else > goto <bb 19>; [11.00%] It does take this into account when unrolling. The issue is - at least for the vectorized epilogue - that when we set the iteration bound based on the VF we do not factor in the iteration bound of the scalar loop when we know the vector loop is iterating at least once. That's something we could improve. The real issue is of course that the diagnostic code is too trigger-happy and the unroll code is prone to leaving one not executable iteration (part).