94247 – Wrong char-subscripts warning for limited-range index

Bug 94247 - Wrong char-subscripts warning for limited-range index

Summary: Wrong char-subscripts warning for limited-range index

Status:	RESOLVED INVALID

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	c (show other bugs)
Version:	10.0

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	diagnostic

Depends on:
Blocks:

Reported:	2020-03-21 10:42 UTC by Roland Illig
Modified:	2020-03-24 00:03 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Roland Illig 2020-03-21 10:42:46 UTC

#define MAXID 20
static const char shft[MAXID] = {1,2,3,4,5,6,7,1,2,3,4,5,6,7,1,2,3,4,5,6};

int hashstr(const char *s) {
  char c;
  char k = 0;
  unsigned int sum = 0;

  while( (c = *s++) != '\0' && k < MAXID-1 ) {
    sum += c + (c<<shft[k++]);
  }
  return (int)(sum >> 1);
}

The above program is a slight variation of the OpenJDK code in src/hotspot/share/libadt/dict.cpp. It uses a char for indexing an array, which triggers this warning:

dict.cpp: In function ‘int hashstr(const char*)’:
dict.cpp:10:28: warning: array subscript has type ‘char’ [-Wchar-subscripts]
     sum += c + (c<<shft[k++]);

At optimization levels 1 and beyond, the possible range for k is determined correctly, and the compiler generates the same code, no matter if k has type char or unsigned int or any other integer type.

The warning is a false positive, and the compiler already knows this. Therefore the warning should not be generated.

Comment 1 Andrew Pinski 2020-03-21 10:48:22 UTC

The warning is not context sensitive.  What I mean is it does not take into account the range of k.

Also you can avoid the warning by using either unsigned char or signed char.

char is a special type.  Unlike other plain types, it can default either to signed or unsigned and for GCC it depends on the ABI.

>and the compiler already knows this
Not when the warning is generated from the front-end.  It does not know the range of the char variable there.

Comment 2 Roland Illig 2020-03-21 11:09:18 UTC

(In reply to Andrew Pinski from comment #1)
> >and the compiler already knows this
> Not when the warning is generated from the front-end.  It does not know the
> range of the char variable there.

Ah, that fine distinction. From my point of view there only was "the compiler". If the range of the variable is not known at that time, it would probably be too much work to implement this extra rule.

So you suggest that the code should be fixed instead? Then I'll tell the OpenJDK people.

Comment 3 Richard Biener 2020-03-23 07:37:17 UTC

Yes, it's bad programming practice to use 'char' for any arithmetic.

Comment 4 Richard Biener 2020-03-23 07:37:28 UTC

Comment 5 Martin Sebor 2020-03-23 17:32:37 UTC

-Wchar-subscripts is far too primitive of an instrument to rely on to detect real bugs with any fidelity.  Besides triggering regardless of whether char is a signed or unsigned type (i.e., every instance of it on with -fno-signed-char is a false positive), and besides not tracking values or ranges (leading to another class of false positives evident in test cases like in comment #0), it doesn't diagnose uses where the char subscript is a negative literal such as in

  int f (char *s)
  {
    return s['\xff'];   // missing warning
  }

or where it has been promoted from char, such as in:

  int f (char *s)
  {
    return s[*s + 1];   // missing warning
  }

Converting the char to signed char or any other type also suppresses the warning without resolving the underlying problem (and, on -fno-signed-char targets, could even introduce a bug into correct code).

Making the warning more discerning and detecting more real problems would require running it later, after some basic optimizations.  But there already is a flow-sensitive warning with the same ultimate goal of detecting out-of-bounds array indices as -Wchar-subscripts: -Warray-bounds.  Unfortunately, because GCC aggressively and rather indiscriminately folds references to static constant arrays of types other than char very early on, negative subscripts in those are not detected by it, and so erroneous expressions like in the snippet below are not diagnosed:

  static const int a[] = { 1, 2, 3 };

  int f (void)
  {
    return a[-1];   // missing warning
  }

The out-of-bounds index above is diagnosed in accesses to non-constant arrays, but only at -O2, because the -Warray-bounds warning runs only at that level.  The only reason why it isn't detected at lower levels is because no effort has been invested into it yet.  Patches are welcome :)

Comment 6 Jakub Jelinek 2020-03-23 18:41:55 UTC

(In reply to Martin Sebor from comment #5)
> -Wchar-subscripts is far too primitive of an instrument to rely on to detect
> real bugs with any fidelity.

No, it diagnoses the main bug, that one uses char as index type, which isn't very portable.  If one uses explicit signed char or unsigned char, it will behave the same way on all targets, but char will not (unless it has values from 0 to SCHAR_MAX only).

Comment 7 Martin Sebor 2020-03-23 23:52:25 UTC

(In reply to Jakub Jelinek from comment #6)
> No, it diagnoses the main bug

Nope, it does not.  -Wchar-subscripts is designed and documented to diagnose a common cause of a bug.  The actual bug itself (which, as noted in pr94182, the manual neglects to describe) is in inadvertently using a negative index as a result of sign extension when a positive index is intended.  When that cannot happen there is obviously no bug to diagnose.

There's no doubt that there is room for improvement in both warnings.  Some of the false negatives might be avoidable by enhancing -Wchar-subscripts in the front-end (e.g., fixing the remainder of pr29455), but the better ROI is in continuing to improve -Warray-bounds (pr56456).

Comment 8 Andrew Pinski 2020-03-24 00:03:45 UTC

(In reply to Martin Sebor from comment #7)
> (In reply to Jakub Jelinek from comment #6)
> > No, it diagnoses the main bug
> 
> Nope, it does not.  -Wchar-subscripts is designed and documented to diagnose
> a common cause of a bug.  The actual bug itself (which, as noted in pr94182,
> the manual neglects to describe) is in inadvertently using a negative index
> as a result of sign extension when a positive index is intended.  When that
> cannot happen there is obviously no bug to diagnose.

Yes and no.  Let's look at it a different way.  negative index is not the issue.  But rather knowing if char is signed or unsigned is the issue.  it is a portability issue.  -Wchar-subscripts is designed to diagnostic that you use char in a subscript as you might not know if it is signed or unsigned because the ABI differences.  Look at PowerPC or ARM ABIs, you will notice that char is unsigned while other ABIs, char is signed.  YES with the code in comment #0, there would be no difference but having a false negative is fine.  Not all warnings are going to be 100% because of the heurstics.  Since the false negative is easy workarounded, just use signed char or unsigned char or int as the loop variable.  char is a special type as I keep on mentioning.