Bug 100477 - Bogus -Wstringop-overflow warning on memset
Summary: Bogus -Wstringop-overflow warning on memset
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 10.3.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: diagnostic
Depends on:
Blocks: Wstringop-overflow
  Show dependency treegraph
 
Reported: 2021-05-07 17:35 UTC by andysem
Modified: 2023-12-15 07:31 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description andysem 2021-05-07 17:35:10 UTC
Consider the following test case:

#include <cstddef>
#include <cstring>

class Container
{
public:
    typedef unsigned char value_type;
    typedef value_type* pointer;
    typedef std::size_t size_type;

    Container();

    void clear();
    void reserve(size_type);

    size_type size() const { return _size; }

    void resize(size_type n)
    {
        if(n == 0)
        {
            clear();
            return;
        }

        if(n > _capacity)
        {
            reserve(n);
        }
        else if(_owned && n < _size)
        {
            std::memset(_buf + n, 0, (_size - n) * sizeof(value_type));
        }
        _size = n;
    }

private:
    pointer _buf;
    size_type _size;
    size_type _capacity;
    int _shrinkCounter;
    bool _owned;
};

void test(Container& c, int v)
{
    Container::size_type position = c.size();
    c.resize(position + sizeof(int));
}

$ g++ -Wall -O3 -o memset_warning.o -c memset_warning.cpp 
In file included from /usr/include/string.h:519,
                 from /usr/include/c++/10/cstring:42,
                 from memset_warning.cpp:2:
In function ‘void* memset(void*, int, size_t)’,
    inlined from ‘void Container::resize(Container::size_type)’ at memset_warning.cpp:32:24,
    inlined from ‘void test(Container&, int)’ at memset_warning.cpp:48:13:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:59:33: warning: ‘void* __builtin_memset(void*, int, long unsigned int)’ specified bound 18446744073709551612 exceeds maximum object size 9223372036854775807 [-Wstringop-overflow=]
   59 |   return __builtin___memset_chk (__dest, __ch, __len,
      |          ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
   60 |      __glibc_objsize0 (__dest));
      |      ~~~~~~~~~~~~~~~~~~~~~~~~~~

Here, the reported constant 18446744073709551612 is -4, which implies the compiler is ignoring the `n < _size` check and calculates `(_size - n)` for memset even though it will never be called.

$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/10/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa:hsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 10.3.0-1ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-10/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-10 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-10-gDeRY6/gcc-10-10.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-10-gDeRY6/gcc-10-10.3.0/debian/tmp-gcn/usr,hsa --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-mutex
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 10.3.0 (Ubuntu 10.3.0-1ubuntu1)
Comment 1 Martin Sebor 2021-05-09 18:54:41 UTC
The warning can be reproduced with the following simplified test case and the IL below (I used -m32 to make the numbers smaller), showing it behaves as designed:

$ cat pr100477.C && gcc -O2 -S -Wall -fdump-tree-optimized=/dev/stdout -m32 -xc pr100477.C
struct A
{
    int  *p;
    __SIZE_TYPE__ n;
};

void f (struct A *p, int v)
{
  __SIZE_TYPE__ n = p->n + sizeof (int);

  if (n < p->n)
    __builtin_memset (p->p + n, 0, (p->n - n) * sizeof *p->p);
}

pr100477.C: In function ‘f’:
pr100477.C:12:5: warning: ‘__builtin_memset’ specified bound 4294967280 exceeds maximum object size 2147483647 [-Wstringop-overflow=]
   12 |     __builtin_memset (p->p + n, 0, (p->n - n) * sizeof *p->p);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

;; Function f (f, funcdef_no=0, decl_uid=1886, cgraph_uid=1, symbol_order=0)

Removing basic block 5
void f (struct A * p, int v)
{
  unsigned int n;
  unsigned int _1;
  int * _2;
  unsigned int _3;
  int * _4;
  __complex__ unsigned int _10;
  unsigned int _11;

  <bb 2> [local count: 1073741824]:
  _1 = p_7(D)->n;
  _10 = .ADD_OVERFLOW (_1, 4);
  n_8 = REALPART_EXPR <_10>;
  _11 = IMAGPART_EXPR <_10>;
  if (_11 != 0)
    goto <bb 3>; [33.00%]
  else
    goto <bb 4>; [67.00%]

  <bb 3> [local count: 354334800]:
  _2 = p_7(D)->p;
  _3 = n_8 * 4;
  _4 = _2 + _3;
  __builtin_memset (_4, 0, 4294967280); [tail call]   <<< -Wstringop-overflow

  <bb 4> [local count: 1073741824]:
  return;

}

Since n is set to _size + 4 in test(), (n < _size) holds only if the addition wraps around zero, implying _size is excessively large.  The warning can be avoided by asserting that that isn't so, e.g., by adding the following

          if (_size >= __PTRDIFF_MAX__ / 4)
            __builtin_unreachable ();

just before the memset call.
Comment 2 andysem 2021-05-10 07:16:30 UTC
(In reply to Martin Sebor from comment #1)
> 
> Since n is set to _size + 4 in test(), (n < _size) holds only if the
> addition wraps around zero, implying _size is excessively large.

If `_size + 4` does overflow, the result of memset is still valid (i.e. it is filling what it is supposed to fill), and the warning is incorrect.

> The
> warning can be avoided by asserting that that isn't so, e.g., by adding the
> following
> 
>           if (_size >= __PTRDIFF_MAX__ / 4)
>             __builtin_unreachable ();
> 
> just before the memset call.

I don't think users should be required to insert non-portable asserts like these to be able to use memset without warnings.

Besides, this assert is incorrect as it will prevent memset to be called if (_size >= __PTRDIFF_MAX__ / 4). (Why `/ 4`, BTW?)

I'm reopening because I think, as it currently works, the warning is bogus and should be fixed. It is not actionable on the user's side and the suggested workaround is not practical.
Comment 3 andysem 2021-05-10 08:16:46 UTC
To put it another way, the case of _size being large and n small is valid, and memset (and resize in general) do behave correctly in this case. Which is why the warning is bogus and the suggested assert is incorrect. Whether n became small is a separate matter unrelated to memset.

You could argue that the line where n becomes small *as a result of overflow* could issue a warning. For example, if size_t was augmented with some sort of an attribute. But that is not what -Wstringop-overflow is.
Comment 4 Martin Sebor 2021-05-10 15:14:06 UTC
The case of _size being very large and n very small may be handled correctly by the original code thanks to the check for _capacity but because its value isn't known it affects neither the codegen nor the warning.

The warning is designed to flag bounds greater than PTRDIFF_MAX and that's just what it sees here as is evident from the output of the -fdump-tree-optimized option.  There is nothing to fix here.  Code that's unreachable as a result of preconditions GCC cannot prove may be susceptible to false positives.  That's a problem shared by all flow-sensitive warnings, not just in GCC but in all static analyzers with flow analysis.

In general, GCC warnings are designed to "report constructions that are not inherently erroneous but that are risky or suggest there may have been an error."  Not every instance of every warning necessarily corresponds to an error, and some may even be false positives.  Unhelpful warnings can be disabled either globally, on the command line, or on a case by case basis by #pragma GCC diagnostic.

Adding preconditions like 'if (_size >= __PTRDIFF_MAX__ / 4) __builtin_unreachable ();' (the 4 should be replaced by sizeof (value_type) in the original test case) often helps not just warnings but also codegen.  They're not required but can be helpful and preferable to suppression.
Comment 5 andysem 2021-05-10 17:04:03 UTC
(In reply to Martin Sebor from comment #4)
> The case of _size being very large and n very small may be handled correctly
> by the original code thanks to the check for _capacity but because its value
> isn't known it affects neither the codegen nor the warning.
> 
> The warning is designed to flag bounds greater than PTRDIFF_MAX

But sizes above PTRDIFF_MAX are valid sizes. IMHO, triggering the warning based on possible sizes is just not the right approach to begin with. If what you're trying to detect is an integral overflow then *that* is what should be detected, and pointed to by the warning, not the unrelated memset.

As an illustration to my point, I worked around this warning by replacing memset with std::fill_n with no other changes to the code. The optimizer will likely convert it to memset anyway. This shows that (1) if the supposed bug is an overflow in `_size + 4` then it is no longer being detected and (2) there is no problem in filling more than PTRDIFF_MAX bytes (as expected, of course).

> and that's
> just what it sees here as is evident from the output of the
> -fdump-tree-optimized option.  There is nothing to fix here.  Code that's
> unreachable as a result of preconditions GCC cannot prove may be susceptible
> to false positives.  That's a problem shared by all flow-sensitive warnings,
> not just in GCC but in all static analyzers with flow analysis.
> 
> In general, GCC warnings are designed to "report constructions that are not
> inherently erroneous but that are risky or suggest there may have been an
> error."  Not every instance of every warning necessarily corresponds to an
> error, and some may even be false positives.

False positives is a very good reason for the warnings to be disabled or ignored, meaning that they are useless.

Is there any reliable detection implemented in -Wstringop-overflow? Meaning that, when the compiler can tell "there's definitely a bug", as opposed to "there may be something fishy going on"? If yes, then it may be sensible to separate the reliable and speculative parts into different warnings. Personally, I would like to have a warning when the compiler is certain that something is wrong, but not the false positives, like shown in this bug. That's the only thing stopping me from just disabling this warning globally.

> Unhelpful warnings can be
> disabled either globally, on the command line, or on a case by case basis by
> #pragma GCC diagnostic.

#pragma GCC diagnostic doesn't work with LTO. I was actually suppressing that warning in the real code base with a #pragma, but it no longer works when LTO is enabled by default in my distro.

> Adding preconditions like 'if (_size >= __PTRDIFF_MAX__ / 4)
> __builtin_unreachable ();' (the 4 should be replaced by sizeof (value_type)
> in the original test case) often helps not just warnings but also codegen. 
> They're not required but can be helpful and preferable to suppression.

Again, that precondition is incorrect.
Comment 6 Martin Sebor 2021-05-10 21:46:11 UTC
In C/C++ the size of the largest object is PTRDIFF_MAX - 1 bytes.  This is necessary so that the difference between the address of its first byte and that just past its last byte fits in ptrdiff_t.  A number of GCC warnings point out code that breaks this assumption, including allocations in excess of the maximum or calls to functions like memset or memcpy that would access more than the maximum.   None of these warnings considers the conditions that must be true in order for control to reach the problem statement, so if the statement hasn't been eliminated from in the IL (i.e., can be seen in the output of -fdump-tree-optimized or whatever pass the warning runs in) the warning triggers.  Taking the conditions into consideration and issuing "maybe" kinds of warnings (like -Wmaybe-uninitialized) is a work in progress but as of now no warning other thanb -Wmaybe-uninitialized does.

No such distinction as "there's definitely a bug" vs "there may be a bug" is feasible because there is, in general, no way to identify functions that are defined but never called, or those that are only called with arguments that make certain code paths unreachable.  As a result, with few exceptions every GCC warning is of the latter kind.  If you're interested, the following two-part article discusses some of the challenges in this area:

https://developers.redhat.com/blog/2019/03/13/understanding-gcc-warnings/
https://developers.redhat.com/blog/2019/03/13/understanding-gcc-warnings-part-2/

If the #pragma suppression doesn't work with LTO please report that as a separate bug.  There's nothing we can do to avoid the warning in this case.
Comment 7 andysem 2021-05-10 22:55:36 UTC
(In reply to Martin Sebor from comment #6)
> In C/C++ the size of the largest object is PTRDIFF_MAX - 1 bytes.  This is
> necessary so that the difference between the address of its first byte and
> that just past its last byte fits in ptrdiff_t.

Sure, but memset, as well as other string operations, takes size_t as an argument and doesn't have this limitation. They must be able to operate on the full range of size_t worth of bytes. I don't remember neither C nor C++ declaring a narrowing limit on the allowed size_t range wrt. these functions.

And please note that string functions are often used on raw storage that is obtained through means other than defined by C++ as operations creating objects. That storage may be larger than the maximum C++ object size.

> A number of GCC warnings
> point out code that breaks this assumption, including allocations in excess
> of the maximum or calls to functions like memset or memcpy that would access
> more than the maximum.   None of these warnings considers the conditions
> that must be true in order for control to reach the problem statement, so if
> the statement hasn't been eliminated from in the IL (i.e., can be seen in
> the output of -fdump-tree-optimized or whatever pass the warning runs in)
> the warning triggers.  Taking the conditions into consideration and issuing
> "maybe" kinds of warnings (like -Wmaybe-uninitialized) is a work in progress
> but as of now no warning other thanb -Wmaybe-uninitialized does.

Ok, then please consider this bug as a case for adding consideration of conditions.

> No such distinction as "there's definitely a bug" vs "there may be a bug" is
> feasible because there is, in general, no way to identify functions that are
> defined but never called, or those that are only called with arguments that
> make certain code paths unreachable.  As a result, with few exceptions every
> GCC warning is of the latter kind.

To clarify, what I mean by "there's definitely a bug" kind of warning is when the compiler has enough data to prove that the operation will result in UB (for example, cause a buffer overflow). Such as:

    char* alloc_string(size_t n)
    {
        char* p = new char[n]; // forgot about the byte for the terminating 0
        std::memset(p, 0, n + 1);
        return p;
    }

This is notably different from compiler trying to reason about a possible value of n based on size_t range and issuing warnings based on that. I would like to see warnings of the former kind but not so much of the latter kind.

If there is no way to separate the two kinds of warnings, and the compiler does not consider prior conditions that are written *specifically* to prevent UB that triggers the warning, I'm afraid, I don't see the use for such warnings.

> If you're interested, the following
> two-part article discusses some of the challenges in this area:
> 
> https://developers.redhat.com/blog/2019/03/13/understanding-gcc-warnings/
> https://developers.redhat.com/blog/2019/03/13/understanding-gcc-warnings-
> part-2/

Thanks, I will read it. I understand that compiler warnings are not easy to implement right.

> If the #pragma suppression doesn't work with LTO please report that as a
> separate bug.  There's nothing we can do to avoid the warning in this case.

The fact that pragmas don't work with LTO is documented in the -flto description[1]. My impression is that gcc simply doesn't support warning control through pragmas when LTO is enabled. Is this not the case?

I can create a bug, but I cannot provide a small repro, since the warning triggers in a large code base, but not necessarily a small test case. Would such a bug report be useful?

[1]: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-flto
Comment 8 Martin Sebor 2021-05-12 00:03:54 UTC
The PTRDOIFF_MAX limit is a general restriction implied by the language requirement that the difference between any two pointers must be representable in ptrdiff_t.  In practical terms, GCC internally represents pointer offsets in a type like ptrdiff_t so all pointer expressions are so restricted.  To enforce the restriction and prevent bugs, GCC rejects declarations of larger types or objects with an error, and issues warnings for attempts to create others (malloc, VLAs).  Most LP64 systems (all major Linux distributions) have much lower limits still (see for example https://access.redhat.com/articles/rhel-limits).  Calls to malloc() fail with sizes far smaller than that, the stack cannot grow to accommodate them, etc., so any access of that size is inevitably an error (solutions like PAE are bound to run into the GCC internal limitation).

As for the alloc_string() example, there is no way to differentiate between the two kinds of cases you mention.  A single code path can be split into two or more with constants propagated into each, rendering one or the other unconditionally invalid even when the original code is valid for a subset of operand values.  This is a limitation inherent to all flow-based GCC warnings (those that depend on optimization).  The static analyzer (-fanalyzer) doesn't depend on optimization so it's not prone to this effect.

Submitting a bug for the LTO problem is only helpful if it comes with a test case to reproduce it.  I have heard about problems suppressing warnings in LTO builds but I'm not aware of it being a design limitation, and other than the mention of the inlining caveat (the -Wstringop-overflow example) I can't find it mentioned in the LTO section of the manual pointed to by the link.
Comment 9 andysem 2021-05-12 09:10:58 UTC
(In reply to Martin Sebor from comment #8)
> 
> Submitting a bug for the LTO problem is only helpful if it comes with a test
> case to reproduce it.  I have heard about problems suppressing warnings in
> LTO builds but I'm not aware of it being a design limitation, and other than
> the mention of the inlining caveat (the -Wstringop-overflow example) I can't
> find it mentioned in the LTO section of the manual pointed to by the link.

The warning is very likely appearing as a result of inlining, and since the compiler inlines based on its own decisions it is difficult to create a test case that would reproduce the problem.

The bottom line is that diagnostic pragmas don't work (reliably) with LTO, and fixing that requires the compiler to track pragmas past inlining.
Comment 10 Martin Sebor 2021-05-12 14:40:45 UTC
You're right that the #pragma can be a little flakey with inlining.  This is a recurring issue that can be made worse by LTO.  A fix I submitted for GCC 11 got deferred to GCC 12 (see pr98465 comment 12).  I have to dust it off and resubmit it.