Bug 87363 - Duplicate and bogus -Wstringop-overflow warning
Summary: Duplicate and bogus -Wstringop-overflow warning
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 8.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: diagnostic
Depends on:
Blocks: Wstringop-overflow
  Show dependency treegraph
 
Reported: 2018-09-19 11:28 UTC by Bernd Edlinger
Modified: 2018-12-11 00:34 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2018-09-19 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Bernd Edlinger 2018-09-19 11:28:20 UTC
as pointed out here: https://gcc.gnu.org/ml/gcc-patches/2018-09/msg00975.html

the following test program emits a duplicate warning, which is bogus, as
u.z is used, and not u.x:

$ cat gcc/testsuite/gcc.c-torture/execute/pr87053.c 
/* PR middle-end/87053 */

const union
{ struct {
    char x[4];
    char y[4];
  };
  struct {
    char z[8];
  };
} u = {{"1234", "567"}};

int main ()
{
  if (__builtin_strlen (u.z) != 7)
    __builtin_abort ();
}

$ gcc gcc/testsuite/gcc.c-torture/execute/pr87053.c
gcc/testsuite/gcc.c-torture/execute/pr87053.c: In function ‘main’:
gcc/testsuite/gcc.c-torture/execute/pr87053.c:15:26: warning: ‘strlen’ argument missing terminating nul [-Wstringop-overflow=]
15 |   if (__builtin_strlen (u.z) != 7)
   |                         ~^~
gcc/testsuite/gcc.c-torture/execute/pr87053.c:11:3: note: referenced argument declared here
11 | } u = {{"1234", "567"}};
   |   ^
gcc/testsuite/gcc.c-torture/execute/pr87053.c:15:26: warning: ‘strlen’ argument missing terminating nul [-Wstringop-overflow=]
15 |   if (__builtin_strlen (u.z) != 7)
   |                         ~^~
gcc/testsuite/gcc.c-torture/execute/pr87053.c:11:3: note: referenced argument declared here
11 | } u = {{"1234", "567"}};
   |   ^
Comment 1 Martin Sebor 2018-09-19 15:27:23 UTC
The warning code tries to avoid duplicate warning by checking and setting TREE_NO_WARNING.  But because it doesn't have access to the tree node representing the strlen(arg) call it uses TREE_NO_WARNING(arg).  When the arg tree node changes for the same strlen(arg) call duplicate warnings aren't suppressed.

Other than that, since

  When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.

the value of u.z is unspecified after the initialization of the union u in the test case.

In strlen(u.z) the warning actually considers the value of u.x returned by string_constant().  This is because the function determines the initializer of an aggregate subobject by calling get_addr_base_and_unit_offset(), i.e., based on the subobject's offset.  For the union in the test case, the offset of u.x is the same as u.z (there is no initializer for u.z and so the values of some of its bytes are unspecified).

To avoid the warning, string_constant() could be enhanced to detect this case (i.e., referencing a union member) and give up.  Or, we could consider the warning a useful reminder for people to avoid relying on unspecified behavior.  Or we could add a different warning to help point out this sort of buggy code.

To determine the number of non-nul bytes in aggregate object that includes multiple members, use memchr(&u, 0, sizeof u) instead of abusing strlen.
Comment 2 jsm-csl@polyomino.org.uk 2018-09-19 17:31:12 UTC
On Wed, 19 Sep 2018, msebor at gcc dot gnu.org wrote:

> Other than that, since
> 
>   When a value is stored in a member of an object of union type, the bytes of
> the object representation that do not correspond to that member but do
> correspond to other members take unspecified values.
> 
> the value of u.z is unspecified after the initialization of the union u in the
> test case.

No, that's only about those bytes that are outside the member that was 
stored (so if you have a union between int and double, and store in the 
int, the parts of the double after the initial sizeof (int) bytes become 
undefined, for example).  Type-punning between union members (when you 
explicitly access using the union type), up to the lesser of the sizes of 
the two members in question, has been defined since C99 TC3 (in a 
footnote, so not normative, but the intent is clear).
Comment 3 Martin Sebor 2018-09-19 21:00:03 UTC
If u's initialization is viewed as a single assignment then you're right.  If the initialization of u is viewed as two separate assignments:

  u.x = "1234";
  u.y = "567";

then strictly speaking, after the second of the two, the first four bytes of u.z have an unspecified value (even though u.x must remain unchanged).  In either case, if the size of the two union members isn't the same the result of accessing the the larger one is unquestionably unspecified.  In C++, the access is undefined regardless of their size.

As with many test cases contrived to find compiler bugs by exploiting the dark corners of the language, the answer depends on one's interpretation, and the solution (warn or not, and what to say in the warning) on how likely fragile code like this is a useful idiom rather than a coding mistake hiding a bug.
Comment 4 jsm-csl@polyomino.org.uk 2018-09-19 21:14:47 UTC
I think the *member* of the union here (the one that is active after the 
initialization) is the anonymous struct containing x and y.  That would 
surely be the case if you named the two struct members (does that make any 
difference to how GCC behaves on this testcase?), and I believe anonymous 
structs and unions purely provide a convenient syntax for accessing nested 
members, without changing other semantics from what they would be if there 
were names given to the members at every level and fully-qualified names 
were used for all member accesses.
Comment 5 Bernd Edlinger 2018-09-20 06:22:10 UTC
Using named struct members does not make a difference.
Of course it is possible that converting address of u.b.z
to const char *, makes the example undefined, as strlen is
not really accessing the bytes thru a union member.

$ cat pr87053.c
/* PR middle-end/87053 */

const union
{ struct {
    char x[4];
    char y[4];
  } a;
  struct {
    char z[8];
  } b;
} u = {{"1234", "567"}};

int main ()
{
  if (__builtin_strlen (u.b.z) != 7)
    __builtin_abort ();
}
$ gcc pr87053.c
gcc -O3 -S pr87053.c 
pr87053.c: In function ‘main’:
pr87053.c:15:28: warning: ‘strlen’ argument missing terminating nul [-Wstringop-overflow=]
15 |   if (__builtin_strlen (u.b.z) != 7)
   |                         ~~~^~
pr87053.c:11:3: note: referenced argument declared here
11 | } u = {{"1234", "567"}};
   |   ^
pr87053.c:15:28: warning: ‘strlen’ argument missing terminating nul [-Wstringop-overflow=]
15 |   if (__builtin_strlen (u.b.z) != 7)
   |                         ~~~^~
pr87053.c:11:3: note: referenced argument declared here
11 | } u = {{"1234", "567"}};
   |   ^