Bug 62181

Summary: [C/C++] Expected new warning: "adding 'char' to a string does not append to the string" [-Wstring-plus-int]
Product: gcc Reporter: Tobias Burnus <burnus>
Component: c++Assignee: Not yet assigned to anyone <unassigned>
Status: NEW ---    
Severity: normal CC: egallager, jynelson, manu, msebor, veksler, vittorio.romeo, webrown.cpp, xry111
Priority: P3 Keywords: diagnostic, easyhack, patch
Version: 5.0   
Target Milestone: ---   
See Also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78679
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30334
Host: Target:
Build: Known to work:
Known to fail: Last reconfirmed: 2014-10-13 00:00:00
Bug Depends on:    
Bug Blocks: 87403    

Description Tobias Burnus 2014-08-19 09:17:44 UTC
CLANG warns for the following code, GCC doesn't (with the options tried):


foo.cc:4:24: warning: adding 'char' to a string does not append to the string [-Wstring-plus-int]
  const char *a = "aa" + 'a';
                  ~~~~~^~~~~
foo.cc:4:24: note: use array indexing to silence this warning
  const char *a = "aa" + 'a';
                       ^
                  &    [    ]
1 warning generated.


Interestingly, the warning does not trigger for integer literals - only for single-character literals and for long/int/char returning functions. That's probably because  "abcd" + 1  is the pointer address shifted by one, i.e. pointing to "bcd". One can also argue whether the note is helpful or not.


In the real-world code, the LHS was a string class and the conversion of "aa" to the string class was missing such that the "operator+" wasn't triggered.


Test case, gives three warnings (and three notes) with CLANG:

#include <stdio.h>

char bar() {
  return 1;
}
int foobar() {
  return 1;
}

int main() {
  const char *a = "aa" + 'c';
  const char *b = "bb" + bar();
  const char *c = "cc" + foobar();
  printf("%s, %s, %s\n", a, b, c);
  return 0;
}
Comment 1 Manuel López-Ibáñez 2014-10-13 00:28:54 UTC
Confirmed. It does warn for "aaa" + x, when x is char/int.
Comment 2 Manuel López-Ibáñez 2014-10-13 00:34:46 UTC
In both FEs there is a function build_binary_op. Put a break-point inside, examine the operands with debug_tree and figure out how to detect this case and when to warn. Add the new warning option to c-family/c.opt and document it in gcc/doc/invoke.texi.

I think the note is useful but we cannot give the fix-it hints yet. Add a FIXME in the code on top of the inform call.
Comment 3 Eric Gallager 2016-08-22 13:40:38 UTC
I think a better fixit would be to suggest using strcat or strncat or strlcat or something, since that actually appends to the string like the programmer probably intended to do. Abusing array indexing like that just looks wrong.
Comment 4 Xi Ruoyao 2017-03-15 06:50:57 UTC
In the recent version of clang this diagnostic was separated to two
warnings, -Wstring-plus-char and -Wstring-plus-int.

-Wstring-plus-char warns

  "adding <char type> to a string pointer does not append to the string"

where <char type> may be char, unsigned char in C/C++, and char16_t,
char32_t, wchar_t in C++ 2011 and above.

-Wstring-plus-int warns

  "adding <int type> to a string does not append to the string"

where the string is a literal and the integer is not a literal (like
Tobias showed in the description).

It's more clear to use two different options.  But the two warnings may
be reported at the same location.  It seems a little redundant.  Should
we silence one while the other has been reported?
Comment 5 Xi Ruoyao 2017-03-15 06:58:14 UTC
Some clang-5.0 warning case:

~~~
Wstring-plus-int.c:20:24: warning: adding 'int' to a string does not append to the string [-Wstring-plus-int]
  const char *a = "aa" + 'a'; /* { dg-warning "does not append" } */
                  ~~~~~^~~~~
Wstring-plus-int.c:20:24: note: use array indexing to silence this warning
  const char *a = "aa" + 'a'; /* { dg-warning "does not append" } */
                       ^
                  &    [    ]
Wstring-plus-int.c:20:24: warning: adding 'char' to a string pointer does not append to the string [-Wstring-plus-char]
  const char *a = "aa" + 'a'; /* { dg-warning "does not append" } */
                  ~~~~~^~~~~
Wstring-plus-int.c:20:24: note: use array indexing to silence this warning
  const char *a = "aa" + 'a'; /* { dg-warning "does not append" } */
                       ^
                  &    [    ]
Wstring-plus-int.c:21:28: warning: adding 'int' to a string does not append to the string [-Wstring-plus-int]
  const wchar_t *b = L"aa" + L'a'; /* { dg-warning "does not append" } */
                     ~~~~~~^~~~~~
Wstring-plus-int.c:21:28: note: use array indexing to silence this warning
  const wchar_t *b = L"aa" + L'a'; /* { dg-warning "does not append" } */
                           ^
                     &     [     ]
Wstring-plus-int.c:22:24: warning: adding 'int' to a string does not append to the string [-Wstring-plus-int]
  const char *e = "aa" + getchar(); /* { dg-warning "does not append" } */
                  ~~~~~^~~~~~~~~~~
Wstring-plus-int.c:22:24: note: use array indexing to silence this warning
  const char *e = "aa" + getchar(); /* { dg-warning "does not append" } */
                       ^
                  &    [          ]
Wstring-plus-int.c:25:21: warning: adding 'char' to a string pointer does not append to the string [-Wstring-plus-char]
  const char *f = a + 'a'; /* { dg-warning "does not append" } */
                  ~~^~~~~
Wstring-plus-int.c:25:21: note: use array indexing to silence this warning
  const char *f = a + 'a'; /* { dg-warning "does not append" } */
                    ^
                  & [    ]
Wstring-plus-int.c:26:23: warning: adding 'char' to a string pointer does not append to the string [-Wstring-plus-char]
  const char *g = 'a' + a; /* { dg-warning "does not append" } */
                  ~~~~^~~
Wstring-plus-int.c:26:23: note: use array indexing to silence this warning
6 warnings generated.
~~~

Note the both warnings are reported at line 22.

By the way clang -Wstring-plus-char doesn't warn for `a += 'a'`.  I think
we should handle it.

I've submitted a series of patches about this PR.  I'll adjust them to be
like clang-5.0 and resubmit them in stage 1.
Comment 6 Martin Sebor 2017-03-15 17:29:16 UTC
With constant arguments (or those whose value or range is known), GCC should warn on the first declaration in comment #0 (copied below) not necessarily because the addition doesn't append 'c' to "aa" (i.e., with -Wstring-plus-char warns) but mainly because it results in an invalid pointer.

  const char *a = "aa" + 'c';

GCC should warn for a constant operand that results in an out-of-bounds pointer regardless of its type, such as the following:

  const char *a = "aa" + 4;

GCC could (and, in my view, should) also warn on the following where the value of i isn't known but where its range is such that the addition will never result in a valid pointer:

  void f (int i)
  {
    if (i < 4) return;
    const char *a = "aa" + i;
    ...
  }
Comment 7 Xi Ruoyao 2017-03-15 18:59:52 UTC
(In reply to Martin Sebor from comment #6)
> With constant arguments (or those whose value or range is known), GCC should
> warn on the first declaration in comment #0 (copied below) not necessarily
> because the addition doesn't append 'c' to "aa" (i.e., with
> -Wstring-plus-char warns) but mainly because it results in an invalid
> pointer.
> 
>   const char *a = "aa" + 'c';
> 
> GCC should warn for a constant operand that results in an out-of-bounds
> pointer regardless of its type, such as the following:
> 
>   const char *a = "aa" + 4;
> 

Clang-5.0 is doing this now for -Wstring-plus-int.  I'll try to do this.

> GCC could (and, in my view, should) also warn on the following where the
> value of i isn't known but where its range is such that the addition will
> never result in a valid pointer:
> 
>   void f (int i)
>   {
>     if (i < 4) return;
>     const char *a = "aa" + i;
>     ...
>   }

We should make a new PR requesting for clang -Warray-bounds as well.  It's
a part of meta-bug PR30334.
Comment 8 Xi Ruoyao 2017-03-15 19:03:05 UTC
(In reply to Xi Ruoyao from comment #7)

> We should make a new PR requesting for clang -Warray-bounds as well.  It's
> a part of meta-bug PR30334.

Sorry. We have -Warray-bounds, but not as well as clang's.  For example clang
warns for "aa"[4], but GCC doesn't.
Comment 9 Martin Sebor 2017-03-15 19:39:35 UTC
(In reply to Xi Ruoyao from comment #8)
> (In reply to Xi Ruoyao from comment #7)
> 
> > We should make a new PR requesting for clang -Warray-bounds as well.  It's
> > a part of meta-bug PR30334.
> 
> Sorry. We have -Warray-bounds, but not as well as clang's.  For example clang
> warns for "aa"[4], but GCC doesn't.

Right.  It only warns on arrays, and only with optimization.  That seems to be because array_at_struct_end_p() in tree.c (called from check_array_ref() in tree-vrp.c) doesn't handle ARRAY_REF with a STRING_CST operand correctly.  It treats it as if it were a reference to an array at the end of a structure.  This only superficially tested change lets GCC warn on that case as well (as long as the result is used).  To detect this without optimization -Warray-bounds would need to also do some checking much earlier.

diff --git a/gcc/tree.c b/gcc/tree.c
index 8f87e7c..5fcbf65 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -13229,6 +13229,9 @@ array_at_struct_end_p (tree ref, bool allow_compref)
       ref = TREE_OPERAND (ref, 0);
     }
 
+  if (TREE_CODE (ref) == STRING_CST)
+    return false;
+
   tree size = NULL;
 
   if (TREE_CODE (ref) == MEM_REF
Comment 10 Xi Ruoyao 2017-06-13 21:27:47 UTC
A series of patch sent to gcc-patches:

<https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00729.html>
Comment 11 Eric Gallager 2018-02-28 07:26:52 UTC
(In reply to Xi Ruoyao from comment #8)
> (In reply to Xi Ruoyao from comment #7)
> 
> > We should make a new PR requesting for clang -Warray-bounds as well.  It's
> > a part of meta-bug PR30334.
> 
> Sorry. We have -Warray-bounds, but not as well as clang's.  For example clang
> warns for "aa"[4], but GCC doesn't.

In that case it's a different meta-bug, PR56456
Comment 12 Manuel López-Ibáñez 2018-09-16 17:33:18 UTC
*** Bug 49481 has been marked as a duplicate of this bug. ***
Comment 13 Manuel López-Ibáñez 2018-09-16 17:35:47 UTC
*** Bug 78679 has been marked as a duplicate of this bug. ***
Comment 14 jynelson 2019-03-24 03:13:39 UTC
Bump. This hit me recently, is there a reason the patch hasn't been merged? Any changes I can contribute to get it in faster? I've checked on 8.2.0 and it doesn't look like this is in either gcc or g++.
Comment 15 Jonathan Wakely 2019-03-25 13:36:59 UTC
Was this question ever answered?
https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01337.html
Comment 16 Xi Ruoyao 2019-03-25 13:54:36 UTC
(In reply to Jonathan Wakely from comment #15)
> Was this question ever answered?
> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01337.html

Oh that's intentional.

This would make this warning more useful, while most people won't use a "char" to address an array or something like.  For example, if someone has wrote a string-like class but forgot to overload operator+=, this warning will detect it when he writes "char c = getchar(); buggy_string t = s + c;".

And for something like

"char *p = s;  char x;  scanf("%hhd", &x);  p = p + x;"

1.  In C++ __INT8_TYPE__ is not __CHAR_TYPE__ so there will be no warning.  He can use "int8_t x;" instead of "char x;".
2.  He can use p = &p[x], which is more clear and nobody will think this is an append.

I remember I wrote a response, with an option to split this into -Wstring-plus-int=1 and -Wstring-plus-int=2.  But why it wasn't sent?  I can't remember.
Comment 17 Eric Gallager 2019-03-25 14:39:09 UTC
(In reply to Xi Ruoyao from comment #16)
> (In reply to Jonathan Wakely from comment #15)
> > Was this question ever answered?
> > https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01337.html
> 
> Oh that's intentional.
> 
> This would make this warning more useful, while most people won't use a
> "char" to address an array or something like.  For example, if someone has
> wrote a string-like class but forgot to overload operator+=, this warning
> will detect it when he writes "char c = getchar(); buggy_string t = s + c;".
> 
> And for something like
> 
> "char *p = s;  char x;  scanf("%hhd", &x);  p = p + x;"
> 
> 1.  In C++ __INT8_TYPE__ is not __CHAR_TYPE__ so there will be no warning. 
> He can use "int8_t x;" instead of "char x;".
> 2.  He can use p = &p[x], which is more clear and nobody will think this is
> an append.
> 
> I remember I wrote a response, with an option to split this into
> -Wstring-plus-int=1 and -Wstring-plus-int=2.  But why it wasn't sent?  I
> can't remember.

It was in other branches of the thread: 
https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01345.html
https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01347.html
Again, as I always do when people propose new numeric levels to warning flags, I'd prefer splitting into new named options instead of adding numeric warning levels, for individual controllability, and end-user clarity. So in this case, maybe the final set could be:
-Wstring-plus-int
-Wstring-plus-char (-Wstring-plus-any-char?)
-Wstring-plus-char-literal
Comment 18 Manuel López-Ibáñez 2019-03-26 09:27:38 UTC
A large patch will often get lost in comments and revisions unless the
submitter is very insistent and committed. If you want to get this moving,
my advice would be to split out the smallest piece possible (string + char
literal) and just submit that for review. Once that is committed, then look
for the next smallest step and repeat.