Bug 103483 - [12/13 regression] context-sensitive ranges change triggers stringop-overread
Summary: [12/13 regression] context-sensitive ranges change triggers stringop-overread
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 12.0
: P3 normal
Target Milestone: 12.3
Assignee: Not yet assigned to anyone
URL:
Keywords: diagnostic
Depends on:
Blocks: Wstringop-overflow
  Show dependency treegraph
 
Reported: 2021-11-30 04:27 UTC by John McFarlane
Modified: 2022-08-19 08:25 UTC (History)
10 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-11-30 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description John McFarlane 2021-11-30 04:27:56 UTC
As of 9a27acc30a34b7854db32eac562306cebac6fa1e, "Make full use of context-sensitive ranges in access warnings.", this source.cpp

#include <string>
template <int a> void c(int d) {
  char buffer[a] = {};
  std::string(buffer, buffer+d);
}
int main() { c<1>(1); }

with command line: `~/gcc-head/bin/g++ -Werror=stringop-overread -O1 -std=c++20 source.cpp` emits:

/home/john/ws/wide/cnl/build/source.cpp
In file included from /home/john/gcc-head/include/c++/12.0.0/string:40,
                 from /home/john/ws/wide/cnl/build/source.cpp:1:
In static member function ‘static constexpr std::char_traits<char>::char_type* std::char_traits<char>::copy(std::char_traits<char>::char_type*, const std::char_traits<char>::char_type*, std::size_t)’,
    inlined from ‘static void std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::_S_copy(_CharT*, const _CharT*, std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::size_type) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /home/john/gcc-head/include/c++/12.0.0/bits/basic_string.h:361:21,
    inlined from ‘static void std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::_S_copy_chars(_CharT*, _CharT*, _CharT*) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /home/john/gcc-head/include/c++/12.0.0/bits/basic_string.h:403:16,
    inlined from ‘void std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::_M_construct(_InIterator, _InIterator, std::forward_iterator_tag) [with _FwdIterator = char*; _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /home/john/gcc-head/include/c++/12.0.0/bits/basic_string.tcc:225:25,
    inlined from ‘void std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::_M_construct_aux(_InIterator, _InIterator, std::__false_type) [with _InIterator = char*; _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /home/john/gcc-head/include/c++/12.0.0/bits/basic_string.h:257:23,
    inlined from ‘void std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::_M_construct(_InIterator, _InIterator) [with _InIterator = char*; _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /home/john/gcc-head/include/c++/12.0.0/bits/basic_string.h:276:20,
    inlined from ‘std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::basic_string(_InputIterator, _InputIterator, const _Alloc&) [with _InputIterator = char*; <template-parameter-2-2> = void; _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /home/john/gcc-head/include/c++/12.0.0/bits/basic_string.h:645:16,
    inlined from ‘void c(int) [with int a = 1]’ at /home/john/ws/wide/cnl/build/source.cpp:4:8:
/home/john/gcc-head/include/c++/12.0.0/bits/char_traits.h:355:56: error: ‘void* __builtin_memcpy(void*, const void*, long unsigned int)’ reading between 2 and 2147483647 bytes from a region of size 1 [-Werror=stringop-overread]
  355 |         return static_cast<char_type*>(__builtin_memcpy(__s1, __s2, __n));
      |                                        ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
/home/john/ws/wide/cnl/build/source.cpp: In function ‘void c(int) [with int a = 1]’:
/home/john/ws/wide/cnl/build/source.cpp:3:8: note: source object ‘buffer’ of size 1
    3 |   char buffer[a] = {};
      |        ^~~~~~
cc1plus: some warnings being treated as errors

Still emitting this warning as of SHA 909b30a17e71253772d2cb174d0dae6d0b8c9401
Compiler Explorer: https://godbolt.org/z/n9cqarErc
Also emits array-bounds warning with `-Wall -Wno-stringop-overread`.
If this is a dupe of an 88443 issue, I'm not sure which one.
Comment 1 Andrew Pinski 2021-11-30 04:39:11 UTC
Adding:

  if (d < 0 || d > a) __builtin_unreachable();

in c makes the warning go away. I think the warning is correct really. as if d is a+1000 it is undefined. Without inlining there is no way to know if d is any value.
Comment 2 Jonathan Wakely 2021-11-30 12:11:38 UTC
(In reply to Andrew Pinski from comment #1)
> I think the warning is correct really.

It's wrong.

> as if
> d is a+1000 it is undefined. Without inlining there is no way to know if d
> is any value.

It's able to inline the size of the buffer all the way down to the char_traits::copy call, but isn't able to inline the length argument as well.

If it can't use the compile-time constant length that is present in the code, it shouldn't bother doing these checks. The middle-end should not assume all code is wrong by default. These bogus stringop warnings in std::string are getting out of hand.

See also PR103332 and others.
Comment 3 Martin Sebor 2021-11-30 17:56:56 UTC
The warning is only issued at -O1.  It's based on the statement in the IL and the values or ranges of its arguments.  In this case the IL and the argument values are below:

=========== BB 6 ============
Imports: _1  d_6(D)  
Exports: _1  d_6(D)  
_1	sizetype [0, 0][2, 15]
    <bb 6> [local count: 362271902]:
    if (_1 == 0)
      goto <bb 8>; [100.00%]
    else
      goto <bb 7>; [0.00%]

6->8  (T) _1 : 	sizetype [0, 0]
6->7  (F) _1 : 	sizetype [2, 15]          >>> _1 > 1

=========== BB 7 ============
    <bb 7> [local count: 346397698]:
    # _2 = PHI <&s.D.26133._M_local_buf(6), _19(3)>
    __builtin_memcpy (_2, &buffer, _1);   <<< -Wstringop-overread

The memcpy() call reads between 2 and 15 bytes from the one-byte buffer.  So the warning code is working as designed.  The problem is that at -O1 the code isn't optimized sufficiently to discover that the memcpy call only reads 1 byte (the function call isn't inlined and so the constant argument isn't propagated into the call). 

GCC 11 doesn't warn because it's unable to determine the range of the last memcpy() argument.  Thanks to the Ranger improvements GCC 12 is able to do better.  In some cases this should improve the S/N ratio of the middle end diagnostics that depend on ranges.  Unfortunately, in others like this one where other optimizations are disabled it can make things worse.
Comment 4 Martin Sebor 2021-11-30 18:13:48 UTC
I don't think this can be "fixed."  Most middle end warnings work a single statement at a time and depend on optimization like constant propagation and dead code elimination to do their job.  If one optimization exposes an invalid statement that would otherwise be eliminated by another optimization that doesn't take place, the warnings trigger.  That's all by design and there's no way change that.  In the test case in comment #0 where the precondition is that d be less than a, making it explicit (e.g., either as Andrew suggests in comment #1 or by adding an equivalen assert statement) seems like the best and only solution.
Comment 5 John McFarlane 2021-11-30 22:33:18 UTC
Here is an example of the real-world code causing this warning: https://github.com/johnmcfarlane/cnl/blob/6d46b6cf10a998e3bdcc32557f202c8579b5717c/test/unit/scaled_int/to_chars.h#L60

It is converting a numeric type to a string with a `to_chars`-like API. It's entirely feasible that a user might wish to convert a number to a 1-digit string and encounter a false positive. I've tried adding the recommended `__builtin_unreachable()` statement at several points in `test` and `to_chars_natural` and it doesn't suppress the warning.

The best course of action I can see is to disable the warnings for GCC-12 and beyond: https://github.com/johnmcfarlane/cnl/blob/main/test/toolchain/gcc-head.cmake#L5
I feel that this is counterproductive.
Comment 6 Aldy Hernandez 2021-12-01 16:38:26 UTC
(In reply to Martin Sebor from comment #4)
> I don't think this can be "fixed."  Most middle end warnings work a single
> statement at a time and depend on optimization like constant propagation and
> dead code elimination to do their job.  If one optimization exposes an
> invalid statement that would otherwise be eliminated by another optimization
> that doesn't take place, the warnings trigger.  That's all by design and
> there's no way change that.  In the test case in comment #0 where the
> precondition is that d be less than a, making it explicit (e.g., either as
> Andrew suggests in comment #1 or by adding an equivalen assert statement)
> seems like the best and only solution.

Oh, it totally could be fixed.  Whether you want to or not, is a separate issue.  These false positives "by design" arguments are just a cop-out.

As Jonathan said, if the warning code can't handle the IL as presented, it should give up, not assume code is wrong by default.

It seems we do very bad with a lot of these warnings at -O1.  We should just disable them at low optimization levels if we can't/won't take measures to reduce the false positive rate here.
Comment 7 Jonathan Wakely 2021-12-01 16:53:06 UTC
I have a patch:

--- a/libstdc++-v3/include/bits/char_traits.h
+++ b/libstdc++-v3/include/bits/char_traits.h
@@ -54,6 +54,11 @@ namespace __gnu_cxx _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wstringop-overflow"
+#pragma GCC diagnostic ignored "-Wstringop-overread"
+#pragma GCC diagnostic ignored "-Warray-bounds"
+
   /**
    *  @brief  Mapping from character type to associated types.
    *
@@ -990,6 +995,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   } // namespace __detail
 #endif // C++20
 
+#pragma GCC diagnostic push
+
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace
Comment 8 Andrew Pinski 2021-12-01 23:38:16 UTC
I think we should come up with a better plan in general for "flow sensative" warnings really. Maybe only enable them with -O2 and above. But we keep on getting more and more of them where it is only defined at runtime if it is hit.
The other thing is not having it in -W -Wall. Or maybe even adding a way (outside of -fsantizer=*) to have runtime checks inside the flow where this happen.
Comment 9 Andrew Pinski 2021-12-01 23:44:12 UTC
(In reply to Andrew Pinski from comment #8)
> I think we should come up with a better plan in general for "flow sensative"
> warnings really. Maybe only enable them with -O2 and above. But we keep on
> getting more and more of them where it is only defined at runtime if it is
> hit.
> The other thing is not having it in -W -Wall. Or maybe even adding a way
> (outside of -fsantizer=*) to have runtime checks inside the flow where this
> happen.

Just to note, I was wrong before in talking about this case (and others) because I didn't realize how much code is going to run into these issues. I think Aldy did put it correct when he wrote: "These false positives "by design" arguments are just a cop-out.".
Comment 10 Martin Sebor 2021-12-02 22:14:28 UTC
Using -O2 doesn't avoid the warning in general.  The following C test case reproduces an equivalent warning at all optimization levels (with GCC 11 it triggers a -Warray-bounds only).  The warning works as designed.  If you don't want these warnings to trigger on these cases we need change the design, starting with outlining the conditions under which they should trigger.  As it is, they all trigger for every invalid call in the IL, whether it's in the source code of the original test case, or in the standard library headers (like in the case of std::string) inlined into user code, or whether it's isolated by the compiler.  Fiddling with optimization levels, disabling them for system headers, or other heuristics won't prevent them under other conditions.

$ cat t.c && gcc -O2 -S -Wall -fdump-tree-optimized=/dev/stdout -Wno-array-bounds t.c
static inline __attribute__ ((always_inline))
void f (char *d, const char *s, __SIZE_TYPE__ n)
{
  if (n == 1)
    *d = *s;
  else
    __builtin_memcpy (d, s, n);
}

static inline  __attribute__ ((always_inline))
void ff (char *d, const char *s0, const char *s1)
{
  f (d, s0, s1 - s0);
}

void g (void*);

void h (int n)
{
  char a[1] = "";
  char b[16];
  if (n)
    ff (b, a, a + n);
  g (b);
}

;; Function h (h, funcdef_no=2, decl_uid=1990, cgraph_uid=3, symbol_order=2)

Removing basic block 7
void h (int n)
{
  char b[16];
  char a[1];
  sizetype _1;

  <bb 2> [local count: 1073741824]:
  a = "";
  if (n_5(D) != 0)
    goto <bb 3>; [50.00%]
  else
    goto <bb 6>; [50.00%]

  <bb 3> [local count: 536870913]:
  _1 = (sizetype) n_5(D);
  if (_1 == 1)
    goto <bb 4>; [51.12%]
  else
    goto <bb 5>; [48.88%]

  <bb 4> [local count: 274448412]:
  MEM[(char *)&b] = 0;
  goto <bb 6>; [100.00%]

  <bb 5> [local count: 262422500]:
  __builtin_memcpy (&b, &a, _1);

  <bb 6> [local count: 1073741824]:
  g (&b);
  a ={v} {CLOBBER};
  b ={v} {CLOBBER};
  return;

}


In function ‘f’,
    inlined from ‘ff’ at t.c:13:3,
    inlined from ‘h’ at t.c:23:5:
t.c:7:5: warning: ‘__builtin_memcpy’ reading 2 or more bytes from a region of size 1 [-Wstringop-overread]
    7 |     __builtin_memcpy (d, s, n);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~
t.c: In function ‘h’:
t.c:20:8: note: source object ‘a’ of size 1
   20 |   char a[1] = "";
      |        ^
Comment 11 CVS Commits 2021-12-09 23:24:38 UTC
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:f8463b0e3ec2438b4cfb8c9a468d59761db2c8a0

commit r12-5874-gf8463b0e3ec2438b4cfb8c9a468d59761db2c8a0
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Thu Dec 2 13:19:41 2021 +0000

    libstdc++: Disable over-zealous warnings about std::string copies [PR103332]
    
    These warnings are triggered by perfectly valid code using std::string.
    They're particularly bad when --enable-fully-dynamic-string is used,
    because even std::string().begin() will give a warning.
    
    Use pragmas to stop the troublesome warnings for copies done by
    std::char_traits.
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/103332
            PR libstdc++/102958
            PR libstdc++/103483
            * include/bits/char_traits.h: Suppress stringop and array-bounds
            warnings.
Comment 12 Jason Merrill 2021-12-10 22:10:51 UTC
(In reply to Martin Sebor from comment #3)
> GCC 11 doesn't warn because it's unable to determine the range of the last
> memcpy() argument.  Thanks to the Ranger improvements GCC 12 is able to do
> better.  In some cases this should improve the S/N ratio of the middle end
> diagnostics that depend on ranges.  Unfortunately, in others like this one
> where other optimizations are disabled it can make things worse.

It seems to me that if we don't warn when we have no information about the range of the argument, we also shouldn't warn when we have only path-dependent information about the range of the argument.

The testcase in comment #0 is definitely dubious for using an unbounded int d to index into a bounded array, so let's consider a more reasonable C testcase derived from PR102958 for which there is no missed-optimization issue.

char *sink;

/* Definitions unavailable to optimization.  */
int secret_strlen (const char *p);
void secret_memcpy (char *, const char *, int);

inline void copy(const char *p)
{
  int L = secret_strlen (p);  
  if (L < 16) /* disabling this branch prevents the warning */
    secret_memcpy (sink, p, L);
  else
    __builtin_memcpy (sink, p, L);
}

void f()
{
  copy ("12"); // bogus warning
}

At the __builtin_memcpy we have a constant string "12" and an unknown length L, and we try to copy L bytes of the constant string into a buffer.  We do this in different ways depending on whether L is less than 16; on the latter path we __builtin_memcpy L bytes from the constant string into the buffer.

And so -Wstringop-overread warns that we're reading 16 or more bytes from a 3-byte string.

But we still have no basis for concluding that L is actually >= 16, we don't know its value at all.  That this path is only taken for L >= 16 doesn't tell us whether it's actually possible that L >= 16; we can't assume that just because we don't know the value of len, it could therefore have any value, and so all branches are reachable for a specific string argument.

We're able to propagate one constant value, and warning because an unknown value might be incompatible with the known value.  And with the recent changes we're able to do more of this asymmetric constant propagation.

Perhaps it would be useful to have an optional mode for these (and other) warnings that does assume that all branches are reachable, but it can't be the default. -Wmaybe-stringop-overread?  -Wstringop-overread=maybe?

My testcase above has given a false positive since GCC 8.

I'm nervous about people adding __builtin_unreachables to suppress these warnings; forced assumptions can be big foot-guns.
Comment 13 Martin Sebor 2021-12-11 00:56:56 UTC
The warning for the test case in comment #12 isn't directly related to ranges: it's issued simply because the invalid statement is in the IL and not eliminated by DCE (the secret functions don't let it).  Similar warnings have been issued in equivalent situations for constants propagated through inlining.  Here's one for -Wnonnull (issued since GCC 7):

char *sink;
  
__attribute__ ((noinline, noipa)) int
null_safe_strlen (const char *p) { return p ?__builtin_strlen (p) : 0; }

static inline void copy (const char *p)
{
  int N = null_safe_strlen (p);
  if (N) /* disabling this branch prevents the warning */
    __builtin_memcpy (sink, p, N);
  else
    *sink = 0;
}

void f()
{
  copy (0); // bogus warning
}

In function ‘copy’,
    inlined from ‘f’ at z.c:17:3:
z.c:10:5: warning: argument 2 null where non-null expected [-Wnonnull]
   10 |     __builtin_memcpy (sink, p, N);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
z.c:10:5: note: in a call to built-in function ‘__builtin_memcpy’

All GCC warnings trigger on invalid statements in the IL, regardless of whether the statements are in reality reachable.  This includes all warnings that consider data flow like -Wnonnull, -Warray-bounds, and -Wformat-overflow among many others.  Ranges just let them find more invalid statements than constants alone would.  This is also the most basic reason why the -Wstringop- warnings are issued for the test case in comment #0 or in or PR 103332.

Two changes are behind the spate of recent bug reports about these warnings for std::string: 1) in GCC 11 we enabled a subset of warnings for code inlined from system headers, and 2) in GCC 12 thanks to Ranger the range info has become more accurate and tighter (larger lower bounds and smaller upper bounds).

Before Jonathan suppressed the warnings in r12-5874 in libstdc++, Andrew MacLeod suggested temporarily (for GCC 12) disabling the context-sensitive Ranger info and going back to global ranges, until we have a better way of dealing with the increased accuracy.  That would reduce the number of false positives but it would also correspondingly increase false negatives, and so defeat one of the main reasons for Ranger: better quality warnings.  It might still be a compromise to consider if the problem turns out to be sufficiently severe and if we can think of a way of to handle the ranges better in the future.  But with the libstdc++ suppression I'm not convinced the problem is severe enough anymore.  And I also can't think of a solution that would let us re-enable Ranger for warnings in the future.  Nothing I've tried so far has showed much promise, and neither seems anything anyone has suggested.

Independently, I have been thinking about adding -Wmaybe- forms of some of these warnings analogous to -Wmaybe-uninitialized (or corresponding levels), to diagnose conditional problems as in:

  char a[4], b[8];

  void f (int i)
  {
    __builtin_memset (i ? a : b, 0, 7);   // okay for b, overflow for a: thus "may overflow a"
  }

but I have not been considering disabling the existing warnings (or removing it from -Wall) and issuing them only under the new option or some new level.  That would in my mind be a drastic step back.
Comment 14 Jason Merrill 2021-12-11 22:43:52 UTC
(In reply to Martin Sebor from comment #13)
> static inline void copy (const char *p)
> {
>   int N = null_safe_strlen (p);
>   if (N) /* disabling this branch prevents the warning */
>     __builtin_memcpy (sink, p, N);
>   else
>     *sink = 0;
> }

This testcase is importantly different from mine; in mine the branch itself is what introduces the range information that causes the warning, we only decide that the statement is invalid because of the if.  In your testcase, if you made the memcpy unconditional we would still warn; in mine we only warn *because* it's conditional.
Comment 15 Jason Merrill 2022-01-17 22:44:42 UTC
Jeff, I remember running into similar issues in the past with jump-threading creating nonsensical blocks which we would then give other warnings about, and I think you fixed at least one of those.  Do you have any input that could help guide us to a resolution of this problem?

Note that the original testcase no longer warns on trunk because <string> disables the warning entirely.

To simplify my example a bit (compile with -O -Wall)

char *sink;
int mystrlen (const char *p);
inline void copy(const char *p)
{
  int L = mystrlen (p);
  if (L < 5)
    /* Small string magic. */;
  else
    __builtin_memcpy (sink, p, L);
}
void f()
{
  copy ("12"); // bogus warning                                                                                
}

I see that this actually warns as far back as GCC 8; I guess this is an older problem that has only gotten more problematic with improvements in value range propagation.

I don't see any plausible way for the user to guard this perfectly reasonable code against this warning, other than disabling it.

Again, at the point of the memcpy we don't know anything about the probability of different values of L.  With or without the if condition, if we try to memcpy 5 bytes out of "12" we get undefined behavior; that doesn't become more likely because we want to handle small L differently.  It creates a branch that is all undefined behavior, but that doesn't make the branch reachable.
Comment 16 Andrew Macleod 2022-01-17 23:10:13 UTC
The only thing I can think of is it is *guaranteed* to be out of range, then assume that is because those other values were handled elsewhere and don't report it?  

L_3     int [5, +INF]
    <bb 3> [local count: 354334800]:
    _4 = (long unsigned int) L_3;
    sink.0_5 = sink;
    __builtin_memcpy (sink.0_5, "12", _4);

_4 : long unsigned int [5, 2147483647]

That is different than if _4 was varying or [1, 2147483647].

of course, then you wont catch the cases where its guaranteed that we are going to copy too many bytes.  But then, you don't get those if you just turn it off either.
Comment 17 Martin Sebor 2022-01-18 00:47:13 UTC
Jaosn: this is how all middle-end warnings have always behaved.  They trigger on invalid statements present in the IL.  A statement is considered invalid when any of its operands is out of bounds or in some other way not valid for it (e.g., null in -Wnonnull, or not pointing to the first byte allocated on the heap in -Wfree-nonheap-pointer, or not matching an allocation call in -Wmismatched-dealloc).  Warnings don't differentiate between constant operands or those in some range.  We have been making more extensive use of ranges since get_range_info() was introduced, but prior to that, -Warray-bounds made use of ranges as well (thanks to VRP).  Even in warnings that don't use ranges, constant operands need not be literals: they can be reduced to constants from ranges by various transformations (jump threading is just one of them).  For more background please see my two-part article from 2019: Understanding GCC Warnings:

  https://developers.redhat.com/blog/2019/03/13/understanding-gcc-warnings
  https://developers.redhat.com/blog/2019/03/13/understanding-gcc-warnings-part-2

It's easy to derive examples from the one in comment #12 or comment #15 showing other similar warnings: replacing the memcpy() call with an array subscript triggers a -Warray-bounds; snprintf() triggers -Wformat-truncation; malloc() triggers -Walloc-size-larger-than; etc.  See below.  (I also showed an example with -Wnonnull in comment #13.  It's issued on the same basis.)

It might seem like the common denominator in all these instances is ranges, but they're a red herring.  The same effect can be demonstrated without them.  The root cause behind them all is that (again) warnings are designed to trigger for apparently reachable invalid IL.  See pr54202 for an example from 2012 with -Wfree-nonheap-object.  The warning is simply based on what the pointer points to, irrespective of the conditions under which the invalid statement is evaluated.

If you consider any of the warnings above false positives you must consider as such all of them.  It makes no sense to do something about just a subset of them and not the rest.  And to avoid them altogether you have to disable (or at least seriously cripple) all those we've ever added into the middle end.  You could, for example, only warn in statements that are reached unconditionally from function entry.  Removing them them all from -Wall and making them opt-in would reduce the number of complaints but only as a result of the number of users explicitly enabling the warnings, without actually improving anything (besides, by being included in -Wall most already are opt-in).  In any event, any of these alternatives would compromise the security improvements we have invested so much in over the years.  The best solution, in my view, is to show users the conditionals under which the invalid statements can be reached.  I hoped to be able to do that by extending Ranger (https://gcc.gnu.org/pipermail/gcc/2021-December/237922.html) but it could also be done by rolling a range propagation engine just for warnings (like for the static analyzer).

char *sink;
__attribute__ ((noipa))
int mystrlen (const char *p)
{
  return __builtin_strlen (p);
}

inline void copy(const char *p)
{
  int L = mystrlen (p);
  if (L < 5)
    /* Small string magic. */;
  else
   *sink = p[L];
}
void f()
{ 
  copy ("12");
} 

In function ‘copy’,
    inlined from ‘f’ at a.c:18:3:
a.c:14:13: warning: array subscript [5, 2147483647] is outside array bounds of ‘char[3]’ [-Warray-bounds]
   14 |    *sink = p[L];
      |            ~^~~

char *sink;
__attribute__ ((noipa))
int mystrlen (const char *p)
{
  return __builtin_strlen (p);
}

inline void copy(const char *p)
{
  int L = mystrlen (p);
  if (L < 5)
    /* Small string magic. */;
  else
    __builtin_snprintf (sink, 5, "%*s", L, p);
}
void f()
{ 
  copy ("12");
} 

a.c: In function ‘f’:
a.c:14:38: warning: ‘__builtin_snprintf’ output truncated before the last format character [-Wformat-truncation=]
   14 |     __builtin_snprintf (sink, 5, "%*s", L, p);
      |                                      ^
In function ‘copy’,
    inlined from ‘f’ at a.c:18:3:
a.c:14:5: note: ‘__builtin_snprintf’ output between 6 and 2147483648 bytes into a destination of size 5
   14 |     __builtin_snprintf (sink, 5, "%*s", L, p);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


$ gcc -O2 -S -Walloc-size-larger-than=4 a.c
har *sink;
__attribute__ ((noipa))
int mystrlen (const char *p)
{
  return __builtin_strlen (p);
}

inline void copy(const char *p)
{
  int L = mystrlen (p);
  if (L < 5)
    /* Small string magic. */;
  else
    sink = __builtin_malloc (L);
}
void f()
{ 
  copy ("12");
} 

In function ‘copy’,
    inlined from ‘f’ at a.c:18:3:
a.c:14:12: warning: argument 1 range [5, 2147483647] exceeds maximum object size 4 [-Walloc-size-larger-than=]
   14 |     sink = __builtin_malloc (L);
      |            ^~~~~~~~~~~~~~~~~~~~
a.c:14:12: note: in a call to built-in allocation function ‘__builtin_malloc’
Comment 18 Jason Merrill 2022-01-28 05:03:46 UTC
(In reply to Martin Sebor from comment #17)
> https://developers.redhat.com/blog/2019/03/13/understanding-gcc-warnings- part-2

Yes, this is a good description of the general problem.
 
> It might seem like the common denominator in all these instances is ranges,
> but they're a red herring.  The same effect can be demonstrated without
> them.  The root cause behind them all is that (again) warnings are designed
> to trigger for apparently reachable invalid IL.

But for this and several other warnings, we only think it's invalid because of ranges; I disagree that they're a red herring.

> If you consider any of the warnings above false positives you must consider
> as such all of them.

Agreed.

> The best solution, in my view, is to show users the conditionals
> under which the invalid statements can be reached.  I hoped to be able to do
> that by extending Ranger
> (https://gcc.gnu.org/pipermail/gcc/2021-December/237922.html)

This sounds useful, and somewhat related to what I was trying to suggest: recognize the case where the only information we have about a value range is path-derived, and make  that subset of warnings optional, for warnings we don't give when we have no information at all about the value range.

To put it another way, if the only reason we think a statement is invalid is because of information deduced from conditions we took to get here, we probably don't want to warn by default.
Comment 19 Jeffrey A. Law 2022-01-28 06:38:38 UTC
Just threading doesn't create nonsensical blocks out of thin air.  It may expose nonsensical code that already existed though.  That's inherent in the path isolating nature of the transformation.

But that's not what's going on in the example you posted.  What's going on there is we have a memcpy where we know the source has only 3 bytes of storage, but we *may* pass in a length of 4 for the memcpy as "mystrlen" is opaque..  That is precisely the kind of scenario where these warnings are supposed to trigger.


I would strongly disagree with the recommendation not to warn because of information deduced from conditionals on the path.  That would cripple the warnings in precisely the case where they're the most valuable IMHO.
Comment 20 Jonathan Wakely 2022-01-28 07:48:50 UTC
But the wording of the diagnostics makes it seem like these "bugs" definitely happen. Too often it takes considerable effort examining GCC dump files to even work out why there is a warning and which conditions would trigger it. Users are not able to do that analysis (and I can't do it), so we can't make use of the warnings. It's also unclear how to "fix" the code to prevent the warnings, see PR 104017 for example.
Comment 21 Jeffrey A. Law 2022-01-28 15:23:25 UTC
Yes, the wording is dreadful.  Yes we need a better way to express to the user the paths followed and how they impacted the analysis.

As for suppressing. There's not a great option here, which isn't a huge surprise.  In this specific case we'd need to be able to make mystrlen less opaque, particularly WRT its return value.  Even if we had a solution to do that, it's still far from good IMHO -- you end up with annotations all over the place.
Comment 22 Richard Biener 2022-03-09 14:11:00 UTC
There isn't going to be a good solution that makes all folks happy - we'd either have false negatives or false positives.  It is true that we're accumulating more and more cases where the user gets the impression we want to warn about

int a[16];
void foo (size_t len)
{
  memset (a, 0, len);
}

like

warning: memset called with unbound 'len' argument to buffer of size 16

for example we do not diagnose

int a[2];
void foo (unsigned len)
{
  if (len == 1 || len == 20)
    __builtin_memset (a, 0, len);
}

even though with len == 20 this is out of bounds.  Instead we only
diagnose if both possible accesses are out of bounds but we fail
to see that in the 'else' case we do not call memset at all.  What's
the real difference to the len == 1 case that makes us to not
emit the diagnostics here?

What we traditionally consider as "always" and "maybe" is also blurry
with more and more IPA optimization (functions are always only "maybe"
executed).

What static analyzers and fuzzers do is isolate every possible path,
sensible or not, and diagnose those.  We're getting closer to that
(but every non-sensical isolated path also consumes object space).
Comment 23 Martin Sebor 2022-03-14 23:58:27 UTC
(In reply to Richard Biener from comment #22)

Your question may have been rhetorical but to be explicit, the real difference is hidden in the implementation (which is why these warnings can sometimes seem inconsistent).  GCC doesn't warn for the second test case (copied below) because it only considers the lower bound of len's range:

int a[2];
void foo (unsigned len)
{
  if (len == 1 || len == 20)
    __builtin_memset (a, 0, len);
}

But the warning would trigger if GCC decided it was profitable to split the memset call into two statements:

int a[2];
void foo (unsigned len)
{
  if (len == 1)
    a[0] = 0;
  else if (len == 20)
    __builtin_memset (a, 0, 20);
}

I suspect most users (though not all, otherwise this report would have never been raised) would consider a warning valid and helpful for the source code.  But if instead of (len == 1 || len == 20) the condition were to be written in terms of a relational expression (like len <= N) where N were greater than or even equal to sizeof (a) + 1, I'd expect complaints about the warning being a false positive because GCC can't "know" that len == N necessarily holds.
Comment 24 Jakub Jelinek 2022-05-06 08:32:02 UTC
GCC 12.1 is being released, retargeting bugs to GCC 12.2.
Comment 25 Richard Biener 2022-08-19 08:25:10 UTC
GCC 12.2 is being released, retargeting bugs to GCC 12.3.