Starting from the mentioned revision we do: $ cat tester.c #include <stdio.h> #include <string.h> #define min(a, b) (((a) < (b)) ? (a) : (b)) struct S { char data[4]; char fallout[100]; }; int main(int argc, char **argv) { struct S s; strncpy(s.data, argv[1], 4 + 100); int length = min(strlen(s.data), 4); printf("length: %d\n", length); return 0; } $ gcc tester.c -O3 && ./a.out 123456 length: 6 before the mentioned revision we did: length: 4. optimized dumps: after revision: ;; Function main (main, funcdef_no=11, decl_uid=2580, cgraph_uid=11, symbol_order=11) (executed once) main (int argc, char * * argv) { struct S s; char * _1; long unsigned int _2; int iftmp.0_3; <bb 2> [local count: 1073741825]: _1 = MEM[(char * *)argv_4(D) + 8B]; strncpy (&s.data, _1, 104); _2 = strlen (&s.data); iftmp.0_3 = (int) _2; printf ("length: %d\n", iftmp.0_3); s ={v} {CLOBBER}; return 0; } before: ;; Function main (main, funcdef_no=11, decl_uid=2580, cgraph_uid=11, symbol_order=11) (executed once) main (int argc, char * * argv) { struct S s; char * _1; long unsigned int _2; int iftmp.0_3; long unsigned int _10; <bb 2> [local count: 1073741825]: _1 = MEM[(char * *)argv_4(D) + 8B]; strncpy (&s.data, _1, 104); _2 = strlen (&s.data); if (_2 <= 3) goto <bb 4>; [50.00%] else goto <bb 3>; [50.00%] <bb 3> [local count: 536870913]: <bb 4> [local count: 1073741825]: # _10 = PHI <_2(2), 4(3)> iftmp.0_3 = (int) _10; printf ("length: %d\n", iftmp.0_3); s ={v} {CLOBBER}; return 0; }
It's invalid because we properly identify that strlen (s.data) <= 4, thus no check is needed.
(In reply to Martin Liška from comment #1) > It's invalid because we properly identify that strlen (s.data) <= 4, thus no > check is needed. No, we can't conclude that. See PR86259.
Duplicate even. *** This bug has been marked as a duplicate of bug 86259 ***
Using strncpy or strlen to cross subobject boundaries isn't valid. GCC catches the strncpy bug with -Warray-bounds: pr86265.c: In function ‘main’: ... In file included from /usr/include/string.h:630, from pr86265.c:2: pr86265.c:13:3: warning: ‘__builtin_strncpy’ offset [5, 104] from the object at ‘s’ is out of the bounds of referenced subobject ‘data’ with type ‘char[4]’ at offset 0 [-Warray-bounds] strncpy(s.data, argv[1], 4 + 100); ^~~~~~~ Use memcpy (&s, ...) and memchr (&s, ...) to get that effect. To make it strictly valid, be sure to pass to the functions the address of the enclosing object rather than a pointer to one of its members: struct S s = { "", "" }; memcpy (&s, argv[1], strlen (argv[1])); int length = min((char*)memchr (&s, 0, sizeof s) - (char*)&s, 4); (-Warray-bounds lets raw memory functions like memcpy and memchr cross subobject boundaries without being diagnosed.)
On Thu, 21 Jun 2018, msebor at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86265 > > --- Comment #4 from Martin Sebor <msebor at gcc dot gnu.org> --- > Using strncpy or strlen to cross subobject boundaries isn't valid. GCC catches > the strncpy bug with -Warray-bounds: > > pr86265.c: In function ‘main’: > ... > In file included from /usr/include/string.h:630, > from pr86265.c:2: > pr86265.c:13:3: warning: ‘__builtin_strncpy’ offset [5, 104] from the object at > ‘s’ is out of the bounds of referenced subobject ‘data’ with type ‘char[4]’ at > offset 0 [-Warray-bounds] > strncpy(s.data, argv[1], 4 + 100); > ^~~~~~~ > > Use memcpy (&s, ...) and memchr (&s, ...) to get that effect. To make it > strictly valid, be sure to pass to the functions the address of the enclosing > object rather than a pointer to one of its members: > > struct S s = { "", "" }; > memcpy (&s, argv[1], strlen (argv[1])); > int length = min((char*)memchr (&s, 0, sizeof s) - (char*)&s, 4); > > (-Warray-bounds lets raw memory functions like memcpy and memchr cross > subobject boundaries without being diagnosed.) I believe the warning (and policy like _FORTIFY_SOURCE=n) is simply a policy deemed good for security reasons. We may not take advantage of policy violations in code generation though.
The strlen range optimization doesn't take advantage of undefined behavior -- like all other optimizations, it simply assumes code is free of it. I have two goals for the warnings I work on: a) most important is to find bugs in user code, and b) less important is to drive improvements to help GCC better analyze source code and emit more efficient object code. By relying on valid calls to strcpy() writing only into the destination array and not beyond, and reading only from the source array and not beyond, GCC can safely assume that other members of the same struct or other elements of the same array of structs than the one written to are unchanged by the strcpy() call. For instance, in the following, the tests can safely be eliminated: struct A { char a[4]; int i; }; void f (struct A *a) { int i = a[0].i + a[1].i; __builtin_strcpy (a[0].a, a[1].a); if (i != a[0].i + a[1].i) __builtin_abort (); } There is no reason not to take advantage of this except to cater to exceptionally poorly written (and I'd say exceedingly rare) code, and thus penalize the overwhelming majority of code that doesn't violate the basic rules of the language.
On Mon, 25 Jun 2018, msebor at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86265 > > --- Comment #6 from Martin Sebor <msebor at gcc dot gnu.org> --- > The strlen range optimization doesn't take advantage of undefined behavior -- > like all other optimizations, it simply assumes code is free of it. > > I have two goals for the warnings I work on: a) most important is to find bugs > in user code, and b) less important is to drive improvements to help GCC better > analyze source code and emit more efficient object code. > > By relying on valid calls to strcpy() writing only into the destination array > and not beyond, and reading only from the source array and not beyond, GCC can > safely assume that other members of the same struct or other elements of the > same array of structs than the one written to are unchanged by the strcpy() > call. For instance, in the following, the tests can safely be eliminated: > > struct A { > char a[4]; > int i; > }; > > void f (struct A *a) > { > int i = a[0].i + a[1].i; > > __builtin_strcpy (a[0].a, a[1].a); > > if (i != a[0].i + a[1].i) > __builtin_abort (); > } > > There is no reason not to take advantage of this except to cater to > exceptionally poorly written (and I'd say exceedingly rare) code, and thus > penalize the overwhelming majority of code that doesn't violate the basic rules > of the language. The "basic rules of the language" are hard to understand. Not only because GCC chooses to apply them selectively (not to mem* but to str*). As GCC developer I'm more concerned about applying rules of the C language to the GIMPLE IL without much consideration. I guess exceptions for mem* also exist because other FEs _do_ emit calls to those functions as part of their IL lowering to GENERIC. Now I would fully expect they do the same for at least a subset of str* simply because when strings are a first-level entity in a programming language and you are using the C runtime you have no choice but using them. Until now the GIMPLE IL rule was that ADDR_EXPR expressions are just address computations and you are not allowed to make any assumptions about the address use based on the structure of a component reference appearing inside it. This very rule made it possible to aggressively propagate into address-computations but at the same time restricted propagation into dereferences. That str* argument ADDR_EXPRs now carry semantic value (same issue applies to the personally "much loved" __builtin_object_size) is disturbing. This basically means we may not propagate into address computations as much as we do? That is, enforcing C language rules to do more optimization without evaluating what (invalid under those C language rules) transforms GCC does itself lead to wrong-code bugs in the past. Doing that for diagnostics only can only lead to false positive diagnostics...
I would expect the additional detail (about the structure of data) to only help improve things, not ever make them worse, or introduce bugs into correct code. But relying on the structure of data is not novel -- the ability is an important part of the language, and GCC already takes advantage of it in some cases. For instance, it eliminates the test below: struct S { char a[4]; int b; char c[1]; }; void f (struct S *s, int i) { int c = s->c[0]; s->a[i] = 'a'; if (c != s->c[0]) __builtin_abort (); } even though calling f() with i set to offsetof (struct S, c) writes 'a' into s->c. GCC doesn't yet eliminate the test when the write is by a built-in like memcpy() or strcpy() but there is no reason for it not to.